1 Introduction

Many models in mechanics and physics are described by dynamical systems with constraints. If the constraints do not originate from position constraints (so called holonomic constraints), the system is called nonholonomic. A typical example is a disc rolling on a surface without slipping. The governing differential equations are obtained from the Lagrange–d’Alembert principle (see Sect. 2), which is not a variational principle (contrary to Hamilton’s principle in Lagrangian mechanics). Therefore, nonholonomic systems do not, in general, preserve a symplectic structure, although total energy is conserved.Footnote 1 Motivated by the success of symplectic integrators for Hamiltonian systems, various discrete versions of the Lagrange–d’Alembert principle have nevertheless been suggested, with the objective of deriving “structure preserving” time-stepping algorithms for nonholonomic systems [6,7,8, 10, 14, 15, 17]. Such algorithms are often called nonholonomic integrators; they are observed to nearly conserve first integrals when applied to some standard nonholonomic test problems. The near conservation is often attributed to the discrete Lagrange–d’Alembert principle.

In this paper we give numerical and theoretical results suggesting that reversibility lies behind the observed good behaviour of nonholonomic integrators, regardless of any underlying discrete Lagrange–d’Alembert principle. Our results in fact reveal that the standard nonholonomic test problems have a bias: they are all reversible integrable. We therefore construct a family of nonholonomic test problems that are still integrable but not reversible (that is, not reversible with respect to a standard reversibility map). We apply several nonholonomic integrators from the literature on the new test problems. None of them exhibit structure preservation for all problems.

The underlying philosophy of nonholonomic integrators is well summarised by Cortés Monforte and Martïnez [7, § 1] as follows: “by respecting the geometric structure of nonholonomic systems, one can create integrators capturing the essential features of this kind of systems.” Because of our limited geometric understanding of nonholonomic dynamics, it is, however, unclear whether nonholonomic integrators at all possess special properties making them superior to other methods (in the same sense as symplectic integrators for Hamiltonian systems possess properties making them superior to non-symplectic methods). Except for exact conservation of momentum maps corresponding to horizontal symmetries [7, § 5], there are no theoretical results pertaining to structure preserving properties of nonholonomic integrators (by contrast, the excellent long-time behaviour of symplectic integrators for Hamiltonian systems and reversible integrators for reversible system is fully explained by KAM theory in combination with backward error analysis, see Hairer, Lubich, and Wanner [12] and references therein). Nonholonomic integrators nevertheless often display “very good energy behaviour” [9, § 1]. Such statements are based on experimental evidence—how well the integrators perform on standard test problems. Thus, a nonholonomic integrator is “structure preserving” if it nearly conserves the first integrals of such test problems over long integration times. With this definition, if we extend the test problem suite to include our unbiased nonholonomic problems, then, as far as we know, there are no structure preserving nonholonomic integrators (all tested integrators fail to be structure preserving for some of the problems). Our aspiration for the new set of unbiased test problems is to serve the community as a more complete suite for evaluating structure preservation of nonholonomic integrators.

We now summarize the contributions in the paper:

  • We show that five classical nonholonomic test problems are part of a larger family of nonholonomically coupled systems (Sects. 2, 2.2).

  • We show that a subset of nonholonomically coupled systems (that includes many classical test problems) are foliations over reversible integrable systems (Theorem 3.3, Sects. 4, 5). We thereby obtain a new family of reversible integrable nonholonomic systems that extends existing systems.

  • We use the result in Theorem 3.3 together with reversible KAM theorem to explain the excellent long-time behaviour of nonholonomic integrators observed experimentally (Sect. 3.3).

  • We propose new, perturbed test problems for nonholonomic integrators (Sect. 2.2). While still integrable, these problems are not reversible and therefore avoid experimental bias.

  • We carry out numerical experiments with five commonly used nonholonomic integrators on both reversible (biased) and non-reversible (unbiased) test problems (Sect. 3). The behaviour in the numerical simulations is consistent with our predictions from the theory (Sect. 3.4). (One of the methods still yields good long-time behaviour on one of the unbiased problems without explanation, but this method fails for other test problems.)

The paper is organised as follows. In Sect. 2 we describe a family of nonholonomic systems; this family contains some of the classical nonholonomic systems in the literature. In Sect. 3 we run numerical experiments on some particular systems in that family, and measure the conservation of the integrals of those systems. In Sect. 3.3 we show how reversible KAM theory in combination with Theorem 3.3 explain near conservation of integrals. In Sect. 3.4 we then verify our theory against the numerical simulations. In Sects. 4 and 5 we prove results leading to Theorem 3.3.

2 Nonholonomically coupled systems

The Lagrange–d’Alembert principle states that a motion curve \(\varvec{q}(t)\), for a system with Lagrangian function \(\mathcal {L}(\varvec{q},\dot{\varvec{q}})\) and nonholonomic constraints \(A(\varvec{q})\dot{\varvec{q}}= 0\) fulfils

$$\begin{aligned} \delta \int _a^{b} \mathcal {L}(\varvec{q}(t),\dot{\varvec{q}}(t)) \,\mathrm {d}t = 0, \end{aligned}$$

for all virtual displacements \(\delta \varvec{q}\) with \(A(\varvec{q}(t))\delta \varvec{q}(t) = 0\). Throughout this paper we assume that the Lagrangian is of the form

$$\begin{aligned} \mathcal {L}(\varvec{q},\dot{\varvec{q}}) = \frac{1}{2}\dot{\varvec{q}}^\top \dot{\varvec{q}} - V(\varvec{q}), \end{aligned}$$

for a potential function V. The equations of motion are then given by

$$\begin{aligned} \begin{aligned} \ddot{\varvec{q}}&= - \nabla V(\varvec{q}) + A(\varvec{q})^\top \varvec{\lambda } \\ 0&= A(\varvec{q})\dot{\varvec{q}} \end{aligned} \end{aligned}$$
(1)

where \(\varvec{\lambda }\in {\mathbb R}^{r}\) are the Lagrange multipliers (see [4, 6] for details).

As we shall see in Sect. 2.2, the continuously varying transmission, the nonholonomic oscillator and nonholonomic particle, the knife edge, vertical rolling disk, are all part of a greater family of nonholonomically coupled systems, where two independent subsystems are coupled through the constraints.

Definition 2.1

A nonholonomically coupled system is a nonholonomic system with Lagrangian of the form

$$\begin{aligned} \mathcal {L}(\underbrace{\varvec{x},\xi }_{\varvec{q}},\underbrace{\dot{\varvec{x}},\dot{\xi }}_{\dot{\varvec{q}}}) = \frac{1}{2} \dot{\varvec{x}}^{\textsf {T}}\dot{\varvec{x}}+ \frac{1}{2} \dot{\xi }^2 - U(\varvec{x})- {V}(\xi ) , \qquad \varvec{x}\in {\mathbb R}^{n-1}, \;\xi \in {\mathbb R}, \end{aligned}$$

and constraints of the form

$$\begin{aligned} {A}(\xi ) \,\dot{\varvec{x}}= 0, \end{aligned}$$
(2)

where for any \(\xi \), the matrix \({A}(\xi )\) has a kernel of dimension one. The \((\xi ,\dot{\xi })\) subsystem is called the driving system.

Remark 2.2

Note that, in the examples of Sect. 2.2, some components of \(\varvec{x}\), or \(\xi \), may be periodic (see Table 1). In the rest of the paper we will nevertheless assume that \(\varvec{x}\in {\mathbb R}^{n-1}\) and \(\xi \in {\mathbb R}\), for the sake of simplicity.

Remark 2.3

Note that the matrix A in Eq. (2) is not the same as the one appearing in (1). The difference is that now A depends only on \(\xi \), and applies only on \(\dot{\varvec{x}}\). This slight abuse of notation should not be confusing.

Notice that the driving system is a self-contained unconstrained Lagrangian system. As indicated, one may think of it as the “driver” for the remaining system.

We write the total energy H as

$$\begin{aligned} H(\varvec{x},\xi ,\dot{\varvec{x}},\dot{\xi }) = \frac{1}{2} \dot{\varvec{x}}^{\textsf {T}}\dot{\varvec{x}}+ U(\varvec{x}) + h(\xi ,\dot{\xi }) , \end{aligned}$$
(3)

where \(h(\xi , \dot{\xi }):=\frac{1}{2}\dot{\xi }^2 + {V}(\xi )\) is the energy of the driving system.

Given a nonholonomically coupled system, let \(\varvec{k}(\xi )\) be a kernel vector of \({A}(\xi )\) such that \(\Vert \varvec{k}(\xi )\Vert = 1\). We then define

$$\begin{aligned} v:=\dot{\varvec{x}}^{\textsf {T}}\varvec{k}(\xi ) . \end{aligned}$$

Since \(\varvec{k}(\xi )\) spans the kernel of \({A}(\xi )\), it follows from the constraint equation (2) that \(\dot{\varvec{x}}= v\varvec{k}(\xi )\). Also note that since \( \frac{1}{2}\dot{\varvec{x}}^{\textsf {T}}\dot{\varvec{x}}= \frac{1}{2} v^2 \), we have

$$\begin{aligned} H(\varvec{x},\xi ,\dot{\varvec{x}},\dot{\xi }) = \frac{1}{2}v^2 + U(\varvec{x}) + h(\xi ,\dot{\xi }) . \end{aligned}$$

Both the total energy H and the energy \(h\) of the driving system are first integrals, so we obtain that

$$\begin{aligned} \frac{\mathrm {d}}{\mathrm {d}t}H&= \frac{\mathrm {d}}{\mathrm {d}t}\left( \frac{1}{2} v^2 + U(\varvec{x}) \right) + \underbrace{\frac{\mathrm {d}}{\mathrm {d}t} h}_{0}\\&= v\frac{\mathrm {d}v}{\mathrm {d}t} - \varvec{F}(\varvec{x})^{\textsf {T}}\dot{\varvec{x}} = 0, \end{aligned}$$

where \(\varvec{F}(\varvec{x}) :=-\nabla U(\varvec{x})\). We now use \(\dot{\varvec{x}} = v\varvec{k}(\xi )\), which, if \(v\ne 0\), gives

$$\begin{aligned} \dot{v} = {\varvec{k}(\xi )}^{\textsf {T}}\varvec{F}(\varvec{x}) . \end{aligned}$$
(4)

Note that, even though this derivation assumes \(v=0\), one can check that (4) is still valid when \(v=0\) by directly computing the Lagrange multiplier from the equation of motion (1).

The equation of motion are thus

$$\begin{aligned} \dot{\varvec{x}}&= \varvec{k}(\xi )v\\ \dot{v}&= \varvec{k}(\xi )^{\textsf {T}}\varvec{F}(\varvec{x}) \end{aligned}$$

where \(\xi \) is a solution of the independent Lagrangian system (the driving system in Definition 2.1)

$$\begin{aligned} \ddot{\xi }+ {V}'(\xi ) = 0 . \end{aligned}$$
(5a)

Thus, every nonholonomically coupled system can be reduced to a system of ordinary differential equations of the form (5), with first integrals given by the passenger energy

$$\begin{aligned} E(\varvec{x},v) :=\frac{1}{2}v^2 + U(\varvec{x}) \end{aligned}$$
(6)

and the driver energy

$$\begin{aligned} h(\xi ,\dot{\xi }) = \frac{1}{2}\dot{\xi }^2 + {V}(\xi ). \end{aligned}$$
(7)

Notice that the total energy (3) is the sum of the driver and passenger energies.

2.1 Quadratic potentials

We now assume that the potential is quadratic, i.e.,

$$\begin{aligned} U(\varvec{x}) = \frac{1}{2} \varvec{x}^{\textsf {T}}K\varvec{x}- \varvec{f}^{\textsf {T}}\varvec{x}, \end{aligned}$$

where \(K\) is a symmetric positive semi-definite matrix and \(\varvec{f}\in {\mathbb R}^{n-1}\) is a constant vector. The corresponding force \(\varvec{F}(\varvec{x}) = -\frac{\partial U}{\partial \varvec{x}}\) is given by

$$\begin{aligned} \varvec{F}(\varvec{x}) = -K\varvec{x}+ \varvec{f}. \end{aligned}$$

Using the spectral decomposition, removing eigenvectors corresponding to zero eigenvalues, the matrix \(K\) can be factorised as

$$\begin{aligned} K= \kappa ^{\textsf {T}}\kappa , \end{aligned}$$

for a rectangular \(m\times (n-1)\) matrix \(\kappa \) of full rank. We define

$$\begin{aligned} \varvec{y}&:=\kappa \varvec{x},\\ \varvec{\varvec{k}}_{0}(\xi )&:=\kappa \varvec{k}(\xi ), \end{aligned}$$

and the projection

$$\begin{aligned} \pi (\varvec{x},v,\xi ,\dot{\xi }) :=(\kappa \varvec{x}, v,\xi ,\dot{\xi }) = (\varvec{y},v,\xi ,\dot{\xi }). \end{aligned}$$
(8)

From (5) we have \(\dot{v} = {\varvec{k}(\xi )^{\textsf {T}}} (-\kappa ^{\textsf {T}}\kappa \varvec{x}+ \varvec{f})\), so we get \(\dot{v} = -\varvec{\varvec{k}}_{0}(\xi )^{\textsf {T}}\varvec{y}+ \varvec{k}(\xi )^{\textsf {T}}\varvec{f}\). The projection \(\pi \) therefore intertwines the original system (5) with the reduced system

$$\begin{aligned} \begin{aligned} \dot{\varvec{y}}&= \varvec{\varvec{k}}_{0}(\xi ) v\\ \dot{v}&= -\varvec{\varvec{k}}_{0}(\xi )^{\textsf {T}}\varvec{y}+ \varvec{k}(\xi )^{\textsf {T}}{\varvec{f}} \end{aligned} \end{aligned}$$
(9)

where again, \(\xi \) fulfills equation (5a). There is thus a stack of three systems above one another, summarised by the following chain of projections:

$$\begin{aligned} (\varvec{x},v,\xi ,\dot{\xi }) \longmapsto (\varvec{y},v,\xi ,\dot{\xi }) \longmapsto (\xi ,\dot{\xi }). \end{aligned}$$

The system (9) can be written in matrix notations, using an auxilliary variable \(\varepsilon \) with initial condition 1, as

$$\begin{aligned} \frac{\mathrm {d}}{\mathrm {d}t} \begin{bmatrix} \varvec{y}\\ v\\ \varepsilon \end{bmatrix} = \begin{bmatrix} 0 &{}\quad \varvec{\varvec{k}}_{0}(\xi ) &{}\quad 0 \\ -\varvec{\varvec{k}}_{0}(\xi )^{\textsf {T}}&{}\quad 0 &{}\quad \varvec{k}(\xi )^{\textsf {T}}{\varvec{f}}\\ 0 &{}\quad 0 &{}\quad 0 \end{bmatrix} \,\begin{bmatrix} \varvec{y}\\ v\\ \varepsilon \end{bmatrix} \end{aligned}$$
(10)

We observe that the matrix in (10) is an element of \(\mathfrak {se}(m+1)\), the Lie algebra of the semidirect product Lie group \(\mathrm {SO}(m+1) \ltimes {\mathbb R}^{m+1}\), where m is the rank of \(\kappa \). If \(\varvec{f}\) is zero, the group reduces to \(\mathrm {SO}(m+1)\). If \(m = 0\), the group reduces to \({\mathbb R}\). The Lie algebra structure of Eq. (10) is central for the reversibility analysis in Sect. 4.

2.2 Examples

Here we give several examples of nonholonomic systems of the form presented in Sect. 2.1. The standard form of these problems are used in the literature as test problems for nonholonomic integrators. We also suggest new modifications of the standard systems, constructed so that they fail to be reversible integrable (as detailed in Sect. 4).

A summary of the problems in this section in terms of the symbols in Sect. 2 is given in Table 1.

Table 1 Summary of the nonholonomically coupled systems presented in Sect. 2. F is the dimension of the kernel of \(\kappa \), and hence the dimension of the fibres of the map \(\pi \) defined by Eq. (8). I is the number of first integrals. \(\theta \) is the number of angle variables. \(\rho \) is the map used to define the reversibility map in (23)

2.2.1 CVT and nonholonomic particle

The continuously variable transmission (CVT) problem is a family of coupled nonholonomic system of the form in Sect. 2.1 with

$$\begin{aligned} \varvec{f}= 0 \end{aligned}$$

and

$$\begin{aligned} \kappa = \begin{bmatrix} 1 &{}\quad 0 \\ 0 &{}\quad 1 \end{bmatrix} . \end{aligned}$$

It is a simple model for a variable transmission gearbox, as illustrated in Fig. 1. The driving system determines the gear ratio. We consider different driver systems.

First, the harmonically driven CVT. In this case, the driver is a harmonic oscillator, in other words, \({V}(\xi ) = \xi ^2/2\). The nonholonomic constraint is then

$$\begin{aligned} {A}(\xi ) = \begin{bmatrix} 1&\quad \xi \end{bmatrix}. \end{aligned}$$

Under the name contact oscillator this case is considered as a test problem in [17]. The system is shown to be reversible integrable in [18]; together with reversible KAM theory this gives a theoretical explanation of the excellent numerical results observed in [17].

Fig. 1
figure 1

Illustration of the principle of the continuous variable transmission (CVT) gearbox. The driving subsystem \((\xi ,\dot{\xi })\) determines the location of the belt which in turn determines the gear ration between the shafts. A nonholonomic system describing the motion is given in Sect. 2.2.1

The vector spanning the kernel of \({A}(\xi )\) is

$$\begin{aligned} \varvec{k}(\xi ) = \frac{1}{\sqrt{1+\xi ^2}}\begin{bmatrix}-\xi \\ 1\end{bmatrix} . \end{aligned}$$

Since \(\varvec{f}=0\) and \(\kappa \) is the identity, the evolution matrix (10) is in this case

$$\begin{aligned} \begin{bmatrix} 0 &{}\quad \varvec{k}(\xi )\\ -\varvec{k}(\xi )^{\textsf {T}}&{}\quad 0 \end{bmatrix} , \end{aligned}$$

so the matrix subalgebra is \(\mathfrak {so}(3)\).

The second type of driver system is the pendulum driven CVT. Here, the gear ratio (driving system) is governed by a nonlinear pendulum instead of the harmonic oscillator. The potential for the nonlinear pendulum is now

$$\begin{aligned} {V}(\xi ) = \cos (\xi ) - \varepsilon \sin (2\xi )/2 , \end{aligned}$$
(11)

where \(\varepsilon \) is an arbitrary perturbation parameter, and the nonholonomic constraint is now given by

$$\begin{aligned} {A}(\xi ) = \begin{bmatrix} 1&\quad \sin (\xi ) \end{bmatrix} . \end{aligned}$$

The vector spanning the kernel of \({A}(\xi )\) is

$$\begin{aligned} \varvec{k}(\xi ) = \frac{1}{\sqrt{1+\sin (\xi )^2}}\begin{bmatrix}-\sin (\xi ) \\ 1\end{bmatrix}. \end{aligned}$$

We refer to the CVT problem with the driving potential (11) as the pendulum driven CVT. As we shall see in Sect. 3.2, \(\varepsilon \ne 0\) corresponds to a non-reversible perturbation, which tends to destroy the good long-time behavior of reversible integrators.

For convenience, the equations of motion are

$$\begin{aligned} \begin{aligned} \ddot{x}_1&= - x_1 + \lambda \\ \ddot{x}_2&= - x_2 + \sin (\xi )\lambda \\ \ddot{\xi }&= - \nabla {V}(\xi ) \\ 0&= \dot{x}_1 + \sin (\xi ) \dot{x}_2 . \end{aligned} \end{aligned}$$
(12)

The last type of driver system is the nonholonomic particle, considered in [7]. It is a degenerate case of harmonically driven CVT, where the spring of one of the shafts has zero stiffness, so

$$\begin{aligned} \kappa = \begin{bmatrix} 0&\quad 1 \end{bmatrix} . \end{aligned}$$

This gives

$$\begin{aligned} \varvec{\varvec{k}}_{0}(\xi ) = \kappa \varvec{k}(\xi ) = \frac{1}{1+\xi ^2} . \end{aligned}$$

Thus, the evolution matrix (10) is

$$\begin{aligned} \begin{bmatrix} 0 &{}\quad \varvec{\varvec{k}}_{0}(\xi )\\ -\varvec{\varvec{k}}_{0}(\xi )^{\textsf {T}}&{}\quad 0 \end{bmatrix} , \end{aligned}$$

so the Lie subalgebra is \(\mathfrak {g}= \mathfrak {so}(2)\).

2.2.2 Knife edge

Fig. 2
figure 2

Illustration of the knife edge system (13). The contact point of the knife edge, or skate, is sliding under gravity on the inclined plane. The direction of sliding is determined by the angle \(\xi \); one may think of a “one-legged skater”, changing direction of his skate according to the driving system

The knife edge (as denoted in [4, § 1.6]), or skate on an inclined plane (as denoted in [19, § 9.1] and [2, § 1.2.5]), is given by

$$\begin{aligned} \begin{aligned} \ddot{x}_1&= 1 -\lambda \sin (\xi ) \\ \ddot{x}_2&= \lambda \cos (\xi ) \\ \ddot{\xi }&= 0 \\ 0&= -\dot{x}_1\sin (\xi ) + \dot{x}_2\cos (\xi ) . \end{aligned} \end{aligned}$$
(13)

An illustration is given in Fig. 2. In terms of the data in Sect. 2, the system is defined by

$$\begin{aligned} \begin{aligned} \varvec{f}&= \begin{bmatrix}1 \\ 0\end{bmatrix}\\ {V}(\xi )&= 0\\ {A}(\xi )&= \begin{bmatrix} -\sin (\xi )&\cos (\xi ) \end{bmatrix}\\ \end{aligned} \end{aligned}$$

and \(\kappa =0\) (it is a \(0\times 2\) matrix), so \(\varvec{y}= [\,]\) (the empty vector). The kernel vector \(\varvec{k}(\xi )\) is

$$\begin{aligned} \varvec{k}(\xi ) = \begin{bmatrix}\cos (\xi ) \\ \sin (\xi )\end{bmatrix}, \end{aligned}$$
(14)

and the evolution equation of the reduced system (9) is simply

$$\begin{aligned} \dot{v} = \varvec{k}(\xi )^{\textsf {T}}\varvec{f}= \cos (\xi ). \end{aligned}$$

Since \(\varvec{y}=[\,]\), the matrix in (10) is given by

$$\begin{aligned} \begin{bmatrix} 0&{}\quad \cos (\xi )\\ 0&{}\quad 0 \end{bmatrix}, \end{aligned}$$

so the underlying group is \({\mathbb R}\).

Consider now a slightly perturbed version of the knife edge, where

$$\begin{aligned} {A}(\xi ) = \begin{bmatrix} -\sin (\xi )&\cos (\xi ) - \varepsilon \end{bmatrix}. \end{aligned}$$

When \(\varepsilon = 0\) this is exactly the knife edge. As we shall see in Sect. 3.1, \(\varepsilon \ne 0\) implies non-integrability.

Fig. 3
figure 3

Illustration of the vertical disk, or rolling penny, given by (15). The rotation of the penny is described by \(x_3\) (measured from the z-axis) and the position of the contact point by \((x_1,x_2)\). The directional angle is described by \(\xi \). Because of conservation of angular momentum, we have \(\ddot{\xi } = 0\)

2.2.3 Vertical rolling disk and mobile robot

The vertical rolling disk is a standard example of a nonholonomic system [4, § 1.4]. It is given by

$$\begin{aligned} \ddot{x}_1&= \lambda _1 \nonumber \\ \ddot{x}_2&= \lambda _2 \nonumber \\ \ddot{x}_3&= \lambda _1 \cos (\xi ) + \lambda _2 \sin (\xi ) \nonumber \\ \ddot{\xi }&= 0 \nonumber \\ 0&= \dot{x}_1 + \cos (\xi )\dot{x}_3 \nonumber \\ 0&= \dot{x}_2 + \sin (\xi )\dot{x}_3. \end{aligned}$$
(15)

An illustration is given in Fig. 3. In terms of Sect. 2, the data are

$$\begin{aligned} \begin{aligned} \kappa&= 0\\ \varvec{f}&= 0\\ {A}(\xi )&= \begin{bmatrix} 1&{}\quad 0 &{}\quad \cos (\xi )\\ 0 &{}\quad 1&{}\quad \sin (\xi ) \end{bmatrix} \\ {V}(\xi )&= 0 \end{aligned} \end{aligned}$$

so the kernel \(\varvec{k}(\xi )\) is given by

$$\begin{aligned} \varvec{k}(\xi ) = \frac{1}{\sqrt{2}}\begin{bmatrix}-\cos (\xi )\\ sin(\xi )\\ 1\end{bmatrix} . \end{aligned}$$

Since \(\varvec{y}\) is the empty vector and \(\varvec{f}=0\), the reduced Eq. (9) is simply \(\dot{v}=0\), so the underlying group is the trivial group \(\mathrm {1}\).

A modification of the vertical rolling disk is the mobile robot [7], for which

$$\begin{aligned} {V}(\xi ) = \sin (\xi ). \end{aligned}$$

Thus, the driving system for the mobile robot is

$$\begin{aligned} \ddot{\xi }= -\cos (\xi ). \end{aligned}$$

Everything else is identical to the vertical rolling disk.

3 Numerical experiments

In this section we give examples and counter-examples of good long-time behaviour for nonholonomic integrators applied to the test problems of Sect. 2.2. The counter-examples stem for the perturbed versions of the knife edge and the CVT. The two perturbed problems correspond to two different mechanisms destroying the good long-time behaviour of nonholonomic integrators: (i) by removal of the integrable structure (perturbed knife edge), and (ii) by removal of the reversible structure (perturbed CVT).

As representatives for nonholonomic integrators we use five different methods:

  1. (1)

    DLA\({}^\alpha \) suggested by Cortés Monforte and Martïnez [7], with \(\alpha = 1/2\). Since \(\alpha = 1/2\), the resulting integrator is reversible. The method is given by Eq. (28) in appendix Sect. A.1.

  2. (2)

    DLA\({}^\alpha \) with \(\alpha = 0.4\), making it a non-reversible integrator. The method is given by Eq. (27) in appendix Sect. A.1.

  3. (3)

    DLA\({}^{0,1}\) suggested by McLachlan and Perlmutter [17]. The method is given by Eq. (29) in appendix Sect. A.1.

  4. (4)

    The leap-frog method (LF) for nonholonomic systems, suggested by [17] and later revisited by Ferraro, Iglesias, and Martín de Diego [8]. The method is given by Eq. (30) in appendix Sect. A.2.

  5. (5)

    The discrete derivative method (DD), initially suggested for Hamiltonian systems by Gonzalez [11] and later adopted to nonholonomic systems by Betsch [3]. The method is given in Sect. A.3.

3.1 Knife edge

The initial data is

$$\begin{aligned} x_1(0) = 0, \quad x_2(0) = 0, \quad \xi (0) = \pi /2, \quad \dot{x}_1(0) = 0, \quad \dot{x}_2(0) = 0, \quad \dot{\xi }(0) = 1. \end{aligned}$$

This corresponds to an initially horizontal skate which is rotated by the driving system thereby picking up speed in the direction of the skate due to gravity. The unperturbed (\(\varepsilon =0\)) and perturbed (\(\varepsilon =0.1\)) systems are integrated using 5 methods:

$$\begin{aligned} \hbox {DLA}{}^{0.5}, \hbox {DLA}{}^{0.4}, \hbox {DLA}{}^{0,1}, \hbox {DD, and LF.} \end{aligned}$$

The stepsize for all methods is \(\Delta t=\pi /10\). The integration interval is [0, 100].

Notice that the dynamics of the driver system \((\xi ,\dot{\xi })\) is trivial for the knife edge (simply \(\ddot{\xi }= 0\)). Thus, all the integrators provide the exact solution of the driver system (by integrator consistency). Consequently, all the integrators exactly preserve the energy h of the driving system.

The evolution of the energy error \(|H(t)-H(0)|\) for all 5 methods is given in Fig. 4. We make the following observations. For the unperturbed system, all integrators except DLA\({}^{0.4}\) exactly or nearly preserves the energy integral H. For the perturbed system, all integrators except DD give a drift in the total energy H.

Fig. 4
figure 4

Evolution of the error \(|H-H(0)|\) of the energy integral for 5 methods applied to the knife edge. The data are convoluted by a running mean with a window of about 3 time units. For all methods except DLA\({}^{0.4}\), the error for the unperturbed system (\(\varepsilon =0\)) is bounded in time. For the perturned system (\(\varepsilon =0.1\)) all methods except DD show energy drift

3.2 Continuously variable transmission (CVT)

The system here is the CVT with potential for the driver system given by (11). We consider two different sets of initial data. First,

$$\begin{aligned} x_1(0) = 1, \quad x_2(0) = 1, \quad \xi (0) = 0, \quad \dot{x}_1(0) = 0, \quad \dot{x}_2(0) = 0, \quad \dot{\xi }(0) = 1.8973666 \end{aligned}$$

corresponding to total energy \(H_0 = 2.8\). Second,

$$\begin{aligned} x_1(0) = 1, \quad x_2(0) = 1, \quad \xi (0) = 0, \quad \dot{x}_1(0) = 0, \quad \dot{x}_2(0) = 0, \quad \dot{\xi }(0) = 2.82842712 \end{aligned}$$

corresponding to total energy \(H_0 = 5.0\). The two different sets of initial data yield two different types of behaviour for the driver system. When the energy level is low (\(H_0 = 2.8\)) the phase diagram of the driver system corresponds to a nonlinear pendulum going back and forth: we call this the oscillating driver. When the energy level is high (\(H_0 = 5.0\)) the phase diagram of the driver system corresponds to a nonlinear pendulum with enough kinetic energy so that it does not turn back, but keep on rotating in the same direction: we call this the rotating driver. The setup is illustrated in Fig. 5.

Fig. 5
figure 5

Phase diagrams for the \((\xi ,\dot{\xi })\) subsystem of the CVT problem, with \(\varepsilon =0\) (left) and \(\varepsilon =0.5\) (right). The circular (oscillating driver) and upper (rotating driver) paths correspond, respectively, to the low (\(H_0 = 2.8\)) and high (\(H_0=5.0\)) energy levels. Both diagrams are symmetric under the standard reversibility map \(\dot{\xi }\mapsto -\dot{\xi }\). Notice that the left diagram also is symmetric under the ‘non-physical’ reversibility map \(\xi \mapsto -\xi \), whereas the right diagram does not have this symmetry (due to the perturbation)

The unperturbed (\(\varepsilon =0\)) and perturbed (\(\varepsilon =0.1\)) CVT systems, for both choices of initial data, are integrated using 5 methods:

$$\begin{aligned} \hbox {DLA}{}^{0.5}, \hbox {DLA}{}^{0.4}, \hbox {DLA}{}^{0,1},\hbox { LF, and DD}. \end{aligned}$$

The stepsize for all methods is \(\Delta t=0.1\). The integration interval is [0, 3000].

The evolution of the passenger energy error \(|E(t)-E(0)|\) for all 5 methods is given in Fig. 6. The evolution of the driver energy error \(|h(t)-h(0)|\) for all 5 methods is given in Fig. 7. Since the CVT system is integrable (as detailed in Sects. 4 and 5), there is an additional integral that is not available explicitly (see Proposition 5.3). Although an explicit formula is not available, we can study this integral at the Poincaré section given by sampling every period of the driver system. It can be interpreted as the latitude along a certain direction (depending on the initial data) of the vector \((x_1,x_2,v)\) with v given by (4); we therefore call it the latitude integral. The evolution of the latitude integral error (at the Poincaré section) for all 5 methods is given in Fig. 8.

Remark 3.1

Since the first preprint of this paper, the asymmetrically perturbed CVT problem has been used as a test problem for energy–momentum type integrators [5]. In that paper, however, the authors evaluate the performance solely based on conservation of energy. This misses the point; to say if an integrators performs well or not on this problem one has to study all the integrals, in particular the latitude integral (for which there is no simple formula).

Fig. 6
figure 6

Error in the passenger energy (6) for the 5 methods applied to the pendulum driven CVT problem (12) for both the oscillating and rotating driver, and two different values of \(\varepsilon \). The data are convoluted by a running mean with a window of 30 time units. In the bottom right diagram, a systematic drift of the energy occurs for all methods but LF

Fig. 7
figure 7

Error in the driver energy (7) for the 5 methods applied to the pendulum driven CVT problem (12) for both the oscillating and rotating driver, and two different values of \(\varepsilon \). The data are convoluted by a running mean with a window of 30 time units. The only method that does not nearly conserve the driver energy is DD. Notice that even DLA\({}^{0.4}\) gives a good behaviour, despite not being reversible: this is fully explained by the fact that the driver system in itself is a Hamiltonian system and DLA\({}^{0.4}\) is a symplectic integrator in absence of constraints, so backward error analysis of symplectic integrators (cf. [12, § IX.3]) fully explains the near conservation

Fig. 8
figure 8

Error in the latitude integral corresponding to the extra first integral due to integrable structure of the CVT problem (see Proposition 5.3). This first integral is ‘hidden’: we have no explicit formula for it. We can only compute it on a Poincaré section by sampling every period of the driver system. The data is convoluted by a running mean with a window of 30 time units. If the numerical integrator is reversible, then preservation of this integral is fully explained by reversible KAM theory, as discussed in Sect. 3.3. Notice that this explains all but one of the results: For the oscillating driver, where reversibility in the driver system is still preserved (corresponding to the green curve in Fig. 5), or when \(\varepsilon = 0\), all reversible integrators exhibit no drift. For the rotating driver with \(\varepsilon \ne 0\), every integrator fails except the leap-frog method. It is an open problem to explain why the leap-frog method works for that particular system (note how leap-frog exhibits drift for the perturbed knife edge problem, see Fig. 4)

3.3 Mechanism for near conservation of integrals

The perturbation theory of Kolomogorov, Arnold, and Moser (KAM theory) comes in two flavours: symplectic and reversible. In short, it states that Hamiltonian/reversible perturbations of Hamiltonian/reversible integrable systems nearly preserve invariant tori. In combination with backward error analysis of numerical integrators, KAM theory rigorously explains near conservation of first integrals for symplectic/reversible integrators applied to Hamiltonian/reversible integrable systems. For a thorough treatment we refer to the monograph by Hairer et al. [12] and references therein.

KAM theory, however, is not readily applicable to nonholonomic systems: it applies to integrable systems of ODE. A main result of this paper is that the class of nonholonomically coupled systems introduced in Sect. 2 can be made compatible with the setting of reversible KAM theory for backward error analysis of reversible integrators, provided that the nonholonomic systems and integrators are reversible with respect to the same reversibility map. We now state this result, which in turn relies on results proved in Sects. 4 and 5.

As always when analyzing constrained systems, the first step is to reduce the constrained system to an ordinary differential equation on a state space manifold \(\mathcal {M}\). Details of how this is done for the nonholonomically coupled systems are given in Sect. 4 below.

Definition 3.2

Consider a vector field X on \(\mathcal {M}\). We assume that there is a surjection \(\pi :\mathcal {M}\rightarrow \mathcal {N}\). (In our case, \(\mathcal {M}\) and \(\mathcal {N}\) are vector spaces or cylinders, and this surjection is just a linear map.) Assume that X descends to a vector field Y, i.e., \(\pi _* X = Y\). We now say that X is fibrated over an R-integrable system if there is an involution R defined on \(\mathcal {N}\), and the system Y is R-integrable (in the precise sense of Definition 5.1 below).

Theorem 3.3

The state space formulation of each unperturbed (\(\varepsilon =0\)) coupled nonholonomic system defined in Sect. 2.2 is fibrated over a reversible integrable system (Definition 3.2).

Proof

The theorem is a consequence of the results presented in Sects. 4 and 5 below, in particular the final result Proposition 5.4.

We now describe the mechanism by which reversible KAM and backward error analysis can be adopted to ODEs fibrated over integrable systems. Given a nonholonomic system whose state space formulation is fibrated over a reversible integrable system, with a projection \(\pi \) and a reversibility map (an involution) \(R:\mathcal {N}\rightarrow \mathcal {N}\) as in Definition 3.2, suppose that a nonholonomic integrator \(\Phi _h:\mathcal {M}\rightarrow \mathcal {M}\) is compatible with \(\pi \) and R in the following sense:

  1. (i)

    It descends to a method \(\Psi _h:\mathcal {N}\rightarrow \mathcal {N}\), i.e.,

    $$\begin{aligned} \Psi _h \circ \pi = \pi \circ \Phi _h . \end{aligned}$$
  2. (ii)

    The descending integrator \(\Psi _h\) preserves the reversibility structure R, i.e.,

    $$\begin{aligned} R \circ \Psi _h \circ R = \Psi _h^{-1}. \end{aligned}$$

Thus, \(\Psi _h\) is a reversible integrator for the reversible integrable system on \(\mathcal {N}\) corresponding to the vector field Y. By backward error analysis of numerical integrators on manifolds (see [12, Ch. IX.2] or [13]) we then get, up to exponentially small terms and for exponentially long times, that the integrator \(\Psi _h\) corresponds to the exact flow of a reversible vector field \(Y_h\). By the reversible KAM Theorem [20, Th. 2], this explains the near conservation of the first integrals for the reversible system on \(\mathcal {N}\). Since these integrals are unaffected by the fibre motion in the fibration \(\mathcal {M}\rightarrow \mathcal {N}\), and since the numerical integrator preserves the fibration, they are also nearly conserved as first integrals of the system on \(\mathcal {M}\). This explains near conservation of integrals for reversible perturbations of reversible integrable nonholonomic systems. The final step is to verify that the conditions (i) and (ii) above for the integrators in 1. This is straightforward. Consider, for example, the knife edge example (Sect. 2.2.2). In this case, the projection \(\pi \) is just \(\pi (\varvec{x},v,\xi ,\dot{\xi }) = (v,\xi ,\dot{\xi })\). The condition (i) is true for all methods in 1 except DD (the discrete derivative method). Next, the reversibility map is \(\rho (v,\xi ,\dot{\xi }) = (v,\xi ,-\dot{\xi })\). A direct calculations shows that condition (ii) holds for all DLA methods, except DLA\({}^{\alpha }\) with \(\alpha \ne \frac{1}{2}\).

To summarize our findings: if a nonholonomic integrator is compatible with the fibrated integrable structure of a nonholonomic coupled system, and at the same time respects the reversible structure of the integrable system, then reversible KAM theory fully explains the good long time behaviour. If, however, the nonholonomic integrator fails to preserve either the fibration structure or the reversible structure of the underlying integrable ODE, then one cannot expect good long time behaviour. The perturbed problems are exactly constructed to break these structures: The perturbed Knife Edge breaks the fibration over an integrable system, whereas the perturbed CVT for the high energy level, although still fibrated over an integrable system, breaks reversibility. Let us now discuss how our predictions adhere to the numerical results.

3.4 Explanation of the numerical results

From the previous section we extract the following properties of the 5 nonholonomic integrators used in the examples above:

 

DLA\({}^{0.5}\)

DLA\({}^{0.4}\)

DLA\({}^{0,1}\)

DD

LF

Preserves fibration

\(\surd \)

\(\surd \)

\(\surd \)

\(\times \)

\(\surd \)

Preserves reversibility

\(\surd \)

\(\times \)

\(\surd \)

\(\surd \)

\(\surd \)

Let us now go through the two numerical examples (knife edge and CVT) and relate the performance of each method to our theoretical predictions.

By Theorem 3.3 the unperturbed (\(\varepsilon =0\)) knife edge is fibrated over a reversible integrable system. By our theoretical predictions, every method that preserves both the fibration and the reversibility should perform well. And indeed they do, as confirmed in the left diagram of Fig. 4. We also see that the non-reversible method, DLA\({}^{0.4}\), exhibits drift. This constitutes strong evidence that reversibility, not the discrete Lagrange–d’Alembert principle, is behind near conservation. Notice that DD exactly preserves the energy by construction. Consider now the more interesting case of the perturbed (\(\varepsilon =0.1\)) knife edge. Recall that this perturbation is constructed to destroy the fibration structure, so in this case our theoretical results cannot be applied. In the right diagram of Fig. 4 we see that all methods, except of course DD, exhibit energy drift. This shows that reversibility alone is not enough for near conservation; it is more intricate.

We now turn to the unperturbed (\(\varepsilon =0\)) CVT problem. Recall that this problem has three first integrals: (i) the driver energy, (ii) the passenger energy, and (ii) the more complicated latitude integral. Also recall that there are two types of drivers depending on the initial conditions: oscillating and rotation (see Fig. 5). By Theorem 3.3 the system is fibrated over a reversible integrable system. As expected from our theory, we see in the left columns of Figs. 67 and 8 that the methods that are both fiber preserving and reversible nearly conserve the integrals. Notice also that DD exhibit drift in all the integrals, although total energy is exactly conserved by construction. This illustrates that methods that are “forced” to preserve one of the integrals generally do not perform well on other integrals.Footnote 2 Notice in the left column of Fig. 7 that the driver energy is nearly conserved for DLA\({}^{0.4}\) despite this integrator not being reversible. This is expected since the driver subsystem is independent from the rest of the dynamics and is actually a Hamiltonian subsystem. Thus, since DLA\({}^{0.4}\) applied to Hamiltonian systems is a symplectic integrator (albeit not reversible), the standard theory of backward error analysis for symplectic integrators explains the near conservation of the driver energy. (The passenger energy, however, is not conserved for DLA\({}^{0.4}\).)

Finally, we turn to the perturbed (\(\varepsilon =0.5\)) CVT problem. It is constructed to still be fibrated over an integrable system, but it is no longer reversible with respect to the standard reversibility map (being integrable it is, of course, still reversible with respect to a different, more complicated reversibility map). Actually, there are two possible reversibility maps, reflection in the \(\varvec{q}\)-plane or the \(\varvec{p}\)-plane, and the reversible methods preserve both of these. \(\varepsilon \ne 0\) destroys the \(\varvec{q}\)-reversibility, but the \(\varvec{p}\)-reversibility remains. However, when the driver is rotating the \(\varvec{p}\)-reversibility becomes “invisible” to the dynamics as illustrated in Fig. 5. Therefore, according to our theoretical predictions, the fibre preserving and reversible methods should perform well with an oscillating driver. Indeed they do, as can be seen in the upper left diagram of Figs. 67, and 8: only the non-reversible DLA\({}^{0.4}\) and non-descending DD methods exhibit drift. Since the independent driver subsystem is Hamiltonian, we also see for the rotating driver at the bottom right of Fig. 7 that all methods that are symplectic when applied to Hamiltonian systems preserve the driver energy, regardless of weather they are reversible or not. So far, our theory is perfectly aligned with the numerical simulations in “both directions”: whenever the theory is applicable we observe near conservation (simply verifying the theoretical results), but also, whenever the theory is not applicable we observe drift. According to this, we expect for a rotating driver (where our theory does not apply) to see drift in the passenger energy in all the methods. In the bottom right of Fig. 6 we observe drift for all methods but LF. This fact, that LF conserves the first integrals of a non-reversible nonholonomic problem, came as a surprise. At this stage we have no explanation for it. We predict, however, that there is some reversibility map, or some modified symplectic structure, shared by that particular problem and the integrator map, and which would allow us to apply KAM theory.

In the remainder of the paper we prove results leading to Theorem 3.3 above.

4 Linear, periodic systems: averaging and reversibility

Systems of the form (10) in Sect. 2.2 can be written generally as

$$\begin{aligned} \dot{\varvec{u}} = \textsf {A}(\xi ) \varvec{u} \end{aligned}$$

where \(\xi \) is the solution of the driving system (5a) and \(\textsf {A}(\xi )\) takes values in a Lie algebra \(\mathfrak {g}\). Assume now that the energy function of the driving system \(h(\xi ,\dot{\xi }) = \dot{\xi }^2/2 + {V}(\xi )\) has bounded level sets, so that solutions are periodic. After a change of variables, the system \((\xi ,\dot{\xi })\) can be rewritten using an angle variable \(\theta \) such that \(\theta ' = \omega (h)\) for some function of the energy h (the action variable). We are thus led to study systems of the form

$$\begin{aligned} \dot{\varvec{u}} = \textsf {A}(\theta ) \varvec{u}, \end{aligned}$$
(16)

where \(\textsf {A}\) is periodic in \(\theta \).

The first aim of this section is to show that systems of the form (16) can be transformed into autonomous linear systems by means of averaging. The second aim is to give conditions under which this averaging transformation is a reversible map.

The results in this section (Theorems 4.2 and 4.4) are generalizations of Theorem 3.1 and Theorem 3.2 in [18].

4.1 Averaging

We first study averaging of systems of the form (16). Averaging means that after a reparametrization, the system (16) is equivalent to a system of the same form but with a constant matrix \(\overline{\textsf {A}}\), which can be interpreted as the average of \(\textsf {A}\). We first reformulate systems of the form (16) as follows:

Definition 4.1

Let \(\mathfrak {g}\) be a Lie subalgebra of \(\mathfrak {gl}(n,{\mathbb R})\). A \(\mathfrak {g}\)-periodic differential equation is a system of the form

$$\begin{aligned} \begin{aligned} {\varvec{u}}'&= \textsf {A}\big (\theta \big )\varvec{u}\\ {\theta }'&= 1 \end{aligned} \end{aligned}$$
(17)

where \((\varvec{u},\theta ) \in {\mathbb R}^n \times {\mathbb R}/{\mathbb Z}\) and \(\textsf {A}:{\mathbb R}/{\mathbb Z}\rightarrow \mathfrak {g}\) is a smooth mapping.

Theorem 4.2

Consider a \(\mathfrak {g}\)-periodic system (Definition 4.1). Then there is a smooth change of variables

$$\begin{aligned} \Psi :{\mathbb R}^n \times {\mathbb R}/{\mathbb Z}\ni (\varvec{u},\theta )\mapsto (\varvec{v}(\varvec{u},\theta ),\theta ) \in {\mathbb R}^{n} \times {\mathbb R}/{\mathbb Z}\end{aligned}$$

such that the \(\mathfrak {g}\)-periodic system (17) expressed in the new variables \((\varvec{v},\theta )\) takes the form

$$\begin{aligned} \begin{aligned} {\varvec{v}}'&= \overline{\textsf {A}}\varvec{v} \\ {\theta }'&= 1 \end{aligned} \end{aligned}$$
(18)

for a constant “average” element \(\overline{\textsf {A}}\in \mathfrak {g}\).

Proof

One defines the flow map \(\Phi (\tau )\) as the solution operator of the differential equation defined for all \(\tau \in {\mathbb R}\) by

$$\begin{aligned} \frac{\mathrm {d}{\varvec{w}}(\tau )}{\mathrm {d}\tau } = \textsf {A}(\tau ){\varvec{w}}(\tau ) \end{aligned}$$
(19)

with initial condition at \(\tau = 0\)—the initial time matters because the differential equation is not autonomous. This means that if \({\varvec{w}}\) is a solution of (19), then

$$\begin{aligned} {\varvec{w}}(\tau ) = \Phi (\tau ) {\varvec{w}}(0) \qquad \forall \tau \in {\mathbb R}, \end{aligned}$$
(20)

and vice versa.

Since \(\textsf {A}(\tau )\in \mathfrak {g}\) for all \(\tau \), the flow map after one period, i.e., \(\Phi (1)\), is an element of \(\textsf {G}\). Let \(\textsf {G}\subset \mathrm {GL}(n,{\mathbb R})\), the connected Lie group corresponding to the Lie algebra \(\mathfrak {g}\). As the exponential is surjective from \(\mathfrak {g}\) to \(\textsf {G}\), there exists \(\overline{\textsf {A}}\in \mathfrak {g}\) such that

$$\begin{aligned} \Phi (1)= \exp (\overline{\textsf {A}}). \end{aligned}$$
(21)

We define the mapping \(\overline{\Psi }:{\mathbb R}^n \times {\mathbb R}\rightarrow {\mathbb R}^n \times {\mathbb R}\) by \(\overline{\Psi }(\varvec{u},\tau ) = \big (\varvec{v}(\varvec{u},\tau ),\tau \big )\) and

$$\begin{aligned} \varvec{v}(\varvec{u},\tau ) := \exp (\overline{\textsf {A}}\tau ) \Phi (\tau )^{-1}\varvec{u}. \end{aligned}$$

Recall that, as \(\textsf {A}\) is periodic, for any integer \(n\in {\mathbb Z}\) and any \(\tau \in {\mathbb R}\) we have (cf. Floquet theory [1, § 28])

$$\begin{aligned} \Phi (n+\tau ) = \Phi (\tau )\Phi (n) . \end{aligned}$$
(22)

Now, \(\varvec{v}(\varvec{u},\tau )\) is periodic in \(\tau \), of period one, because, due to (22), and the definition (21) of \(\overline{\textsf {A}}\),

$$\begin{aligned} \exp \big (\overline{\textsf {A}}(\tau +1)\big )\Phi (\tau +1)^{-1}= \exp (\overline{\textsf {A}}\tau )\exp (\overline{\textsf {A}})\Phi (1)^{-1}\Phi (\tau )^{-1}= \exp (\overline{\textsf {A}}\tau )\Phi (\tau )^{-1}. \end{aligned}$$

As a result, the mapping \(\overline{\Psi }\) induces a mapping \(\Psi :{\mathbb R}^n \times {\mathbb R}/{\mathbb Z}\rightarrow {\mathbb R}^n \times {\mathbb R}/{\mathbb Z}\). Note that, as \(\overline{\Psi }\) is a smooth change of variables, so is \(\Psi \). We now proceed to show that \(\overline{\Psi }\) sends solutions of (17) to solutions of (18).

Consider a solution \( \big (\varvec{u}(t),\tau (t)\big )\) of the differential equation

$$\begin{aligned} \begin{aligned} {\varvec{u}}'(t)&= \textsf {A}\big (\tau (t)\big )\varvec{u}(t) \\ {\tau }'(t)&= 1 \end{aligned} \end{aligned}$$

Define \(t_0 = -\tau (0)\). Clearly we then have \(\tau (t) = t - t_0\) and \(\varvec{u}'(t) = \textsf {A}(t-t_0)\varvec{u}(t)\). By defining \({\varvec{w}}(\sigma ) :=\varvec{u}(\sigma + t_0)\) we obtain \( {\varvec{w}}'(\sigma ) = {\varvec{u}}'(\sigma + t_0) = \textsf {A}(\sigma )\varvec{u}(\sigma + t_0) = \textsf {A}(\sigma )\varvec{w}(\sigma ) \), so \({\varvec{w}}\) is a solution of (19). As a result, \({\varvec{w}}(\sigma ) = \Phi (\sigma ){\varvec{w}}(0)\), which, using \(\varvec{u}(t) = {\varvec{w}}\big (\tau (t)\big )\), gives \( \varvec{u}(t) = \Phi \big (\tau (t)\big ) \varvec{u}(t_0) \). We thus obtain that along a solution \(\big (\varvec{u}(t),\tau (t)\big )\),

$$\begin{aligned} \Phi \big (\tau (t)\big )^{-1}\varvec{u}(t) = \varvec{u}(t_0). \end{aligned}$$

As a result, we obtain

$$\begin{aligned} {\varvec{v}}'(t) = \overline{\textsf {A}}\exp (\overline{\textsf {A}}\tau ) \varvec{u}(t_0) = \overline{\textsf {A}}\varvec{v}(t), \end{aligned}$$

which proves the result for the mapping \(\overline{\Psi }\) and thus also for \(\Psi \).

4.2 Reversibility

We now turn to the question of whether the mapping \(\Psi \) defined in Theorem 4.2 can preserve a reversibility structure. We namely equip the space \({\mathbb R}^n\times {\mathbb R}/{\mathbb Z}\) with a linear involution R, defined as

$$\begin{aligned} R(\varvec{u},\theta ) :=(\rho \varvec{u}, -\theta ), \end{aligned}$$
(23)

for a given linear involution \(\rho \).

We first observe the following condition on \(\rho \) which will turn out to be essential for the preservation of the reversibility structure R in Theorem 4.4.

Lemma 4.3

The \(\mathfrak {g}\)-system (Definition 4.1) is reversible with respect to R if and only if

$$\begin{aligned} \rho \textsf {A}(\theta ) \rho = -\textsf {A}(-\theta ). \end{aligned}$$
(24)

Proof

Let \(f(\varvec{u}, \theta ) :=(\textsf {A}(\theta ) \varvec{u}, 1)\) be the vector field defining the differential equation (17).

As R is linear, reversibility with respect to R means that \(-f(\varvec{u}, \theta ) = R f(R(\varvec{u}, \theta ))\). Note that \(f(R(\varvec{u},\theta )) = f(\rho \varvec{u}, -\theta ) = (\textsf {A}(-\theta )\rho \varvec{u},1)\). This gives the condition \((-\textsf {A}(\theta ) \varvec{u}, -1) = (\rho \textsf {A}(-\theta )\rho \varvec{u}, -1)\), which proves the claim. \(\square \)

We now turn to the main result of this section.

Theorem 4.4

Assume that (24) holds. Assume further that the average matrix \(\overline{\textsf {A}}\in \mathfrak {g}\) defined in Theorem 4.2 is such that

$$\begin{aligned} \overline{\textsf {A}}\subset \Omega , \end{aligned}$$

where \(\Omega \) is a subset of \(\mathfrak {g}\) where the exponential is injective, and such that \(-\Omega \subset \Omega \), and that \(\rho \Omega \rho \subset \Omega \). Then the averaging mapping \(\Psi :{\mathbb R}^{n}\times {\mathbb R}/{\mathbb Z}\rightarrow {\mathbb R}^{n}\times {\mathbb R}/{\mathbb Z}\), defined in Theorem 4.2, preserves reversibility, i.e., \(R\circ \Psi = \Psi \circ R\).

Proof

Recall that the flow map \(\Phi \) is defined by (20). Using (24), one shows that

$$\begin{aligned} \rho \Phi (-\tau ) = \Phi (\tau ) \rho . \end{aligned}$$

This implies in particular that \( \rho \Phi (-1) = \Phi (1)\rho \), so using (22), we obtain

$$\begin{aligned} \rho \Phi (1)^{-1}= \Phi (1)\rho . \end{aligned}$$
(25)

Now, recalling the definition of \(\overline{\textsf {A}}\) in (21), we notice that (25) implies that

$$\begin{aligned} \exp (-\overline{\textsf {A}}) = \rho \exp (\overline{\textsf {A}})\rho = \exp (\rho \overline{\textsf {A}}\rho ) . \end{aligned}$$

As \(-\overline{\textsf {A}}\in -\Omega \subset \Omega \) and \(\rho \overline{\textsf {A}}\rho \in \rho \Omega \rho \subset \Omega \), we use the injectivity of the exponential and deduce that \(-\overline{\textsf {A}}= \rho \overline{\textsf {A}}\rho \). We therefore obtain

$$\begin{aligned} \exp (-\overline{\textsf {A}}\tau ) \rho = \rho \exp (\overline{\textsf {A}}\tau ) . \end{aligned}$$
(26)

By combining (25) and (26) we get

$$\begin{aligned} \begin{aligned} \varvec{v}(\rho \varvec{u}, -\tau )&= \exp (-\overline{\textsf {A}}\tau )\Phi (-\tau )^{-1}\rho \varvec{u} \\&= \exp (-\overline{\textsf {A}}\tau ) \rho \Phi (\tau )^{-1}\varvec{u} \\&= \rho \exp (\overline{\textsf {A}}\tau ) \Phi (\tau )^{-1}\varvec{u} \end{aligned} \end{aligned}$$

so we finally obtain

$$\begin{aligned} \varvec{v}(\rho \varvec{u}, -\tau ) = \rho \varvec{v} . \end{aligned}$$

This finishes the proof. \(\square \)

5 Fibrations over reversible integrable systems

The goal of this section is to show that all the nonholonomically coupled systems studied in Sect. 2.2 are fibrated over a reversible integrable system, as summarized in Table 1. The situation is illustrated in Fig. 9.

5.1 Reversible integrability

Fig. 9
figure 9

An illustration of the setting of the examples treated in this paper. The actual system is represented by the spirals in the manifold \(\mathcal {M}\). The system, however, sits above an integrable system, which foliation in tori is represented downstairs on \(\mathcal {N}\). If the numerical integrator descends to a reversible integrator downstairs, then there is no drift in the first integral of that system. Moreover, as the energy depends only on these invariants, there is no energy drift either

We briefly recall the definition of an R-integrable, or reversible integrable, system [20].

Definition 5.1

Consider a manifold \(\mathcal {N}\), equipped with an involution R. A dynamical system, i.e., a vector field Y on the manifold \(\mathcal {N}\), is R-integrable if there is a map \(\varphi :\mathcal {N}\rightarrow {\mathbb R}^p \times ({\mathbb R}/{\mathbb Z})^k \), which we denote by \(\varphi (x) = (I(x), \theta (x))\), such that

  1. (i)

    \(\varphi \circ R = (I, -\theta )\)

  2. (ii)

    the induced vector field is \(\dot{I} = 0\) and \(\dot{\theta } = \omega (I)\) for a given frequency map\(\omega \)

  3. (iii)

    the image of the frequency map \(\omega \) does not lie in a proper linear subspace.

Remark 5.2

The last condition (iii) is called a nondegeneracy, or non resonance, condition [20]. Note that the usual non resonance condition (also called diophantine condition [12, §X.2.1]) is not strong enough for our examples, as discussed in [18].

We first make a statement about some \(\mathfrak {g}\)-periodic systems (Definition 4.1).

Proposition 5.3

Consider a \(\mathfrak {g}\)-periodic system. We assume that the average matrix \(\overline{\textsf {A}}\) from Theorem 4.2 is either zero, or, if \(n \le 3\), an element of \(\mathfrak {so}(n)\). Suppose further that there exists a map \(\rho \) fulfilling (24). Then the system is R-integrable (Definition 5.1), with R defined in (23).

Proof

The assumptions of Theorems 4.2 and 4.4 are fulfilled, so we obtain a variable transformation which brings the system to the form (18). If \(\overline{\textsf {A}}\) is zero, there are only action variables. If \(\overline{\textsf {A}}\) is in \(\mathfrak {so}(n)\) for \(n \le 3\), we have one more angle variable determined by the only angle of the rotation matrix \(\exp (\overline{\textsf {A}})\). \(\square \)

Proposition 5.4

All the reduced systems defined in Sect. 2.2 fulfill the assumptions of Proposition 5.3.

Proof

The only nontrivial case is that of the knife edge, where the group is \({\mathbb R}\). The average matrix \(\overline{\textsf {A}}\) computed in Theorem 4.4 can be computed explicitly in that case:

$$\begin{aligned} \overline{\textsf {A}}= \begin{bmatrix} 0&{}\quad \overline{\varvec{k}}^{\textsf {T}}{\varvec{f}}\\ 0&{}\quad 0 \end{bmatrix} \end{aligned}$$

where

$$\begin{aligned} \overline{\varvec{k}} :=\frac{1}{2\pi }\int _0^1 \varvec{k}(\xi )\mathrm {d}\xi , \end{aligned}$$

which, following (14), is simply zero.

We define the reversibility structure R from (23), where \(\rho \) is defined as follows. For the CVT, \(\rho \) is

$$\begin{aligned} \rho (\varvec{y},v) = (\varvec{y},-v). \end{aligned}$$

For the remaining systems, \(\rho \) is

$$\begin{aligned} \rho (v) = v. \end{aligned}$$

In both cases, \(\rho \) fulfils (24). \(\square \)

6 Conclusions and open problems

In this section we summarize the main points of our paper and, based on our findings, list a set of open problems. These problems are aimed for the nonholonomic integrators community; our goal is that work on them shall lead to a better understanding of geometric integration algorithms for nonholonomic systems.

We start with the conclusions. The field of nonholonomic integrators aims to construct numerical methods specifically designed for nonholonomic systems, aspiring to outperform standard integration schemes. The starting point for designing such integrators is most often some discrete emulation of the Lagrange–d’Alembert principle. Contrary, however, to the holonomic case, where Hamilton’s principle leads to conservation of symplecticity, the phase space flow structure induced by the Lagrange–d’Alembert principle is not fully understood; total energy is always conserved but are there any additional structures? This lack of understanding is reflected in the literature on nonholonomic integrators. Indeed, several different, incompatible discrete versions of the Lagrange–d’Alembert principles have been suggested and it is unclear if one or the other yield better performance. (Compare, for example, the DLA-principle in [7, 17] with the GNI approach in [8].) All the suggested discrete frameworks do, however, have a common feature: if the nonholonomic system is reversible (with respect to the standard reversibility map), then the discrete phase flow map preserves reversibility. Since all standard nonholonomic test problems are reversible (and typically also integrable), one can expect that the observed numerical behaviour is explained by theory for reversible dynamical systems (for example reversible KAM theory). In this paper we have concluded this to be the case for a large class of nonholonomic test problems (fibrations over reversible integrable systems). To verify that reversibility alone is responsible for the good long-time behaviour we have developed a new class of perturbed nonholonomic test problems that are still integrable, but no longer reversible with respect to the standard reversibility map. Our simulations show that none of the tested nonholonomic integrators perform well on all of these problems. Of course, our specific selection of test problems does not cover all nonholonomic problems studied in the literature. They do, however, in many ways represent the simplest non-trivial nonholonomic systems and are therefore worthy for primary investigations of nonholonomic integrators (the notion being that before moving on to more complicated problems, an integrator should perform well on the simplest non-trivial problems).

In addition to nonholonomic integrators based on discrete emulation of the Lagrange–d’Alembert principle, another approach is to start directly from conservation of energy and momentum and construct algorithms that exactly preserve these conservation laws (energy–momentum methods, see e.g. [3]). Our numerical simulations with a commonly used method from this class shows that, although both energy and reversibility are preserved, the fibration structure of the test problems is not preserved, even for the reversible problems, which leads to a drift in the “hidden” integrals. This numerical observation is in agreement with our theory based on reversible KAM theory, which cannot be applied unless the fibration structure is also preserved. We thus conclude that forcing conservation of first integrals is not enough to obtain good long-time behaviour.

6.1 Open problems

We have formulated a set of problems that reflect, in our opinion, the core challenges for the nonholonomic integrator community. The problems are listed from most to least general.

  1. (1)

    Backward error analysis for relevant classes of nonholonomic systems In lack of a complete characterization of nonholonomic systems, one can still develop a numerical backward error analysis (cf. [12]) for subclasses of nonholonomic dynamics where the (reduced) phase space geometry is understood. We suggest to start with the class of integrable nonholonomically coupled systems described in this paper. Thus, the problem consists in developing nonholonomic integrators such that, when they are applied to integrable nonholonomically coupled systems, their modified vector fields on the reduced phase space preserve the structure of being fibrated over integrable systems.

  2. (2)

    Explain leap-frog performance in Fig. 6 The lower right diagram of Fig. 6 indicates that the leap-frog method nearly conserves the driver energy of a rotating, non-reversible driver. As pointed out, this is the only near conservation behaviour we have seen that is not (directly) explained by reversible KAM theory. The open problem is to come up with a rigorous understanding of why near conservation is observed in this case. A hypothesis is that reversible KAM still can be used, but that one needs to slightly modify the reversibility map which happen to be conserved by leap-frog (but not the other integrators). A rigorous understanding of the unexplained behaviour is likely to shed light on the structure of nonholonomic systems. (We remark again that leap-frog yields drift in integrals for other non-reversible integrable systems, for example the perturbed knife edge as seen in Fig. 4.)

  3. (3)

    Energy–momentum methods that preserve fibrations As seen in Fig. 8, the energy preserving method (DD), despite being reversible, performs poorly on the reversible CVT problem (it exhibits drift in the “hidden” first integral). The reason for the poor performance has to do with this integrator not perserving the fibration structure. The last open problem consists in finding an energy–momentum nonholonomic integrator that also preserves the fibration structure. To base the integrator on the average vector field approach, instead of discrete derivatives, might lead to a solution.