1 Introduction

In classical variational calculus, the Euler–Lagrange (E–L) Equations provide necessary conditions for local optimality for functionals that have integrands that are sufficiently regular, under the assumption that the extremal solutions are also sufficiently regular. Stronger necessary conditions for local optimality such as those due to Jacobi and Legendre are also well known [1, 2]. But in many cases testable sufficient conditions for global optimality cannot be established unless the integrand is convex. Otherwise, global optimality may not exist, may not be unique, or may not be provable. A simple example of this is geodesics on the sphere. The Euler–Lagrange equations yield great arcs as the solution, but for most pairs of points on the sphere there are two great arcs connecting them, one of which is globally minimal, and one of which takes the long way around. When the points are antipodal, all great arcs connecting them are globally optimal, hence there is a severe lack of uniqueness. In negatively curved (hyperbolic) spaces it is well known that geodesics generated by solving the Euler–Lagrange equations are globally optimal and unique [3, 4]. But this is a relatively rare situation, and uses specific geometric arguments. In contrast, the formulation presented here will rely minimally on geometry, yet will have implications for geometric problems.

A broad class of variational problems for which minimality of the solution can be guaranteed is when the functional is convex in both the generalized coordinates and their rates [5, 6]. But this is a tall order, since even in mechanics where the Lagrangian is the integrand of the action functional, joint convexity of the Lagrangian in coordinates and their rates is rare. Hence, it is well known that the “Principle of Least Action” is a misnomer. It is really the “Principal of Extremal Action” since the Euler–Lagrange equations are generated and solved but little attention is given as to whether the resulting solution is globally minimal.

1.1 The Euler–Lagrange equations

Perhaps the most trivial variational integral for which the E–L equations provide unique globally optimal solutions is that the arclength functional for curves in the Euclidean plane. The general single-degree-of-freedom variational problem is that of finding a function \(y=y(x)\) such that the functional

$$\begin{aligned} J[y] \,=\, \int _{x_1}^{x_2} f(y(x),y'(x),x)dx \end{aligned}$$
(1)

is extremized (where \(y' = dy/dx\)) for a given integrand \(f(\cdot )\) subject to boundary conditions \(y(x_1) = y_1\) and \(y(x_2) = y_2\). For example, when

$$\begin{aligned} f(y(x),y'(x),x) = \sqrt{1+ [y'(x)]^2} \end{aligned}$$

the problem becomes that of finding the curve \(y=y(x)\) that extremizes arclength.

The E–L equation corresponding to (1) is

$$\begin{aligned} \frac{\partial f}{\partial y} - \frac{d}{dx} \left( \frac{\partial f}{\partial y'} \right) = 0. \end{aligned}$$
(2)

In the case of the arclength functional, this becomes \(y'' =0\), which when integrated twice gives \(y(x) = ax+b\), the equation of a line. Even without invoking Euclid’s elements, within the class of problems with solutions of the form \(y = y(x)\), the line is optimal because any perturbation added to it that observes the boundary conditions can only increase J. The details as to why this is so will be addressed shortly, and have far reaching implications for proving global optimality in other less trivial problems.

One might argue that the above is limiting because it does not allow for the possibility of curves in the plane connecting the points \((x_1,y_1)\) and \((x_2,y_2)\) that are not functions of the form \(y=y(x)\). To consider the more general case, we would need to seek parameterized curves of the form \(x=y_1(t)\) and \(y=y_2(t)\), and solve a two-degree-of-freedom variational problem. The theory for such problems is also classical. Given

$$\begin{aligned} J[\textbf{y}] \,=\, \int _{t_1}^{t_2} f(y_1,...,y_N,\dot{y}_{1},...,\dot{y}_{N},t)dt \end{aligned}$$

where \(\dot{y}_{i} = dy_i/dt\), the set of E–L equations for \(i=1,...,N\) are

$$\begin{aligned} \frac{\partial f}{\partial y_i} - \frac{d}{dt} \left( \frac{\partial f}{\partial \dot{y}_{i}} \right) = 0. \end{aligned}$$
(3)

This assumes that \(f: \mathbb {R}^N \times \mathbb {R}^N \times [t_1,t_2] \,\longrightarrow \, \mathbb {R}\) is sufficiently smooth for all derivatives to exist and the existence of extremal trajectories that are smooth enough for their evaluation in the E–L equations to make sense. Two special classes of problems will be considered later in which: (1) f is not even differentiable, and hence the E–L equations do not apply, and (2) the Lavrentiev phenomenon occurs in which f is regular but the resulting global minimizer is not, and therefore does not satisfy the E–L equations. Yet global optimality can be guaranteed in both cases.

Returning to the curve in the Euclidean plane connecting two points, the arclength functional can be written as

$$\begin{aligned} f(y_1,y_2,\dot{y}_{1},\dot{y}_{2},t) \,=\, \sqrt{[\dot{y}_{1}]^2 + [\dot{y}_{2}]^2} , \end{aligned}$$

and the resulting E–L equations are

$$\begin{aligned} \ddot{y}_i = 0 \,\,\,\textrm{for}\,\,\, i=1,2, \end{aligned}$$

thus resulting in the parameterized line \(y_i^*(t) = a_i t + b_i\), or \(\textbf{y}^*(t) = \textbf{a} t + \textbf{b}\) where \(\textbf{a}\) and \(\textbf{b}\) are constant vectors that provide the freedom to match the boundary conditions. The strength of this argument relative to the previous one, again without invoking Euclid, is that it considers the possibility of curves in the plane that connect two points and are not necessarily plotted as a function \(y=y(x)\).

More generally, given two points in a region of a manifold parameterized by coordinates \(\textbf{q}= [q_1,...,q_N]^T\), and metric tensor \(M(\textbf{q})\), the E–L equations (with \(q_i\) taking the place of \(y_i\)) and the functional

$$\begin{aligned} f(\textbf{q},\dot{\textbf{q}},t) = \sqrt{\dot{\textbf{q}}^T \,M(\textbf{q})\, \dot{\textbf{q}}} \end{aligned}$$
(4)

give necessary conditions for a geodesic. The square root is somewhat inconvenient when performing computations, and justification for eliminating it will be given shortly.

A metric tensor can arise in purely geometric contexts, e.g., for surfaces such as Minkowski sums of elliposoids [7], or it can arise from the kinetic energy of a system of P particles with positions \(\textbf{x}_i \in \mathbb {R}^3\), masses \(m_i\), and generalized coordinates \(\textbf{q}\in \mathbb {R}^N\) (with \(N \le 3P\)) in which case it is well known from classical mechanics that the entries of the mass metric tensor become

$$\begin{aligned} m_{ij}(\textbf{q}) \,=\, \sum _{k=1}^{P} m_k \frac{\partial \textbf{x}_k}{\partial q_i} \cdot \frac{\partial \textbf{x}_k}{\partial q_j} . \end{aligned}$$

Hence the following well-known results from mechanics and a simple theorem will be helpful.

In Mechanics, the Lagrangian is defined to be the difference of kinetic and potential energyFootnote 1

$$\begin{aligned} L(\textbf{q},\dot{\textbf{q}}) \,=\,T(\textbf{q},\dot{\textbf{q}}) \,-\, V(\textbf{q}) \end{aligned}$$

and the total energy is

$$\begin{aligned} E(\textbf{q},\dot{\textbf{q}}) \,=\,T(\textbf{q},\dot{\textbf{q}}) \,+\, V(\textbf{q}) . \end{aligned}$$

The kinetic energy for a mechanical system always has the form

$$\begin{aligned} T(\textbf{q},\dot{\textbf{q}}) \,=\, \frac{1}{2}\dot{\textbf{q}}^T \,M(\textbf{q})\, \dot{\textbf{q}}. \end{aligned}$$
(5)

The mass matrix of a mechanical system, \(M(\textbf{q})\), satisfies the required properties of a metric tensor.

Lagrange’s equations of motion are the Euler–Lagrange equations (3) with L in place of f, and \(q_i\) in place of \(y_i\), and they describe the motion of conservative systems. That is, if \(\textbf{q}^*(t)\) denotes the solution to Lagrange’s equations, it can be shown that energy is conserved:

$$\begin{aligned} E(\textbf{q}^*,\dot{\textbf{q}}^*) \,=\, const. \end{aligned}$$
(6)

This classical fact will be used in the theorem presented below.

Lemma 1.1

Suppose T is known a priori to be limited to the range \([x_1,x_2]\) where \(0< x_1< x_2 < \infty \) and let \(\Phi : [x_1,x_2] \rightarrow \mathbb {R}_{>0}\) be a twice differentiable monotonically increasing function. Then the solution to the variational problems with a functional of the form \(f(\textbf{q},\dot{\textbf{q}}) = \Phi (T(\textbf{q},\dot{\textbf{q}}))\) will be the same as the solution to Lagrange’s equations in the absence of a potential energy term.Footnote 2

Proof

From the chain rule

$$\begin{aligned} \frac{\partial f}{\partial q_i} = \Phi '(T) \frac{\partial T}{\partial q_i} \,\,\,\textrm{and} \,\,\, \frac{\partial f}{\partial \dot{q}_i} = \Phi '(T) \frac{\partial T}{\partial \dot{q}_i} , \end{aligned}$$

and

$$\begin{aligned} \frac{d}{dt} \left( \frac{\partial f}{\partial \dot{q}_i}\right) = \Phi ''(T) \dot{T} \frac{\partial T}{\partial \dot{q}_i} + \Phi '(T) \frac{d}{dt}\left( \frac{\partial T}{\partial \dot{q}_i}\right) . \end{aligned}$$

But for a mechanical system with \(V=0\), \(T = const\) from (6), and so \(\dot{T}=0\). Therefore

$$\begin{aligned} \Phi '(T) \left[ \frac{d}{dt} \left( \frac{\partial T}{\partial \dot{q}_i}\right) - \frac{\partial T}{\partial q_i} \right] \,=\, \frac{d}{dt} \left( \frac{\partial f}{\partial \dot{q}_i}\right) - \frac{\partial f}{\partial q_i} = 0 . \end{aligned}$$

Since \(\Phi \) is monotonically increasing on the interval over which it is defined and continuous (since it is twice differentiable), it is also invertible. Moreover, \(\Phi '(T) > 0\). Therefore the variational problems for f and T are interchangeable.

In particular, restricting to when \(T_0 > 0\) and letting \(\Phi (x) = \sqrt{2x}\), we can solve the geodesic problem without the pesky square root, as if it were a mechanics problem.

The usefulness of this result is now illustrated with the line in the plane where it is easier to prove the global optimality for the integrand T than the integrand \(f = \Phi (T)\) in the original arclength functional. Starting with the variational solution \(\textbf{y}^*(t) = \textbf{a} t + \textbf{b}\) and letting \(\textbf{y}(t) = \textbf{y}^*(t) + \varvec{\epsilon }(t)\), we see that the cost becomes

$$\begin{aligned} \int _{t_1}^{t_2} \Vert \dot{\textbf{y}}\Vert ^2 dt = \int _{t_1}^{t_2} \left\{ \Vert \textbf{a}\Vert ^2 + 2 \textbf{a} \cdot \dot{\varvec{\epsilon }} + \Vert \dot{\varvec{\epsilon }}\Vert ^2 \right\} \, dt \,=\, (t_2-t_1) \Vert \textbf{a}\Vert ^2 + 2 \textbf{a} \cdot \int _{t_1}^{t_2} \dot{\varvec{\epsilon }}dt \,+\, \int _{t_1}^{t_2} \Vert \dot{\varvec{\epsilon }}\Vert ^2 \, dt. \end{aligned}$$
(7)

But since the perturbed solution must satisfy the same boundary conditions,

$$\begin{aligned} \int _{t_1}^{t_2} \dot{\varvec{\epsilon }} \, dt = \varvec{\epsilon }(t_2) - \varvec{\epsilon }(t_1) = \textbf{0} - \textbf{0} = \textbf{0}. \end{aligned}$$

From this it is clear that adding \(\varvec{\epsilon }\) to \(\textbf{y}^*\) can only increase cost in (7). The same argument does not apply to the cost \(\int _{t_1}^{t_2} \Vert \dot{\textbf{y}}\Vert \, dt\) because the square root prevents the integral from acting directly on \(\dot{\varvec{\epsilon }}\). That said, the Cauchy–Schwarz inequality provides the bound

$$\begin{aligned} \int _{t_1}^{t_2} \Vert \dot{\textbf{y}}\Vert \, dt \,\le \, \sqrt{t_2 - t_1} \left( \int _{t_1}^{t_2} \Vert \dot{\textbf{y}}\Vert ^2 \, dt\right) ^{\frac{1}{2}} . \end{aligned}$$

Substituting the minimal solution for the problem with integrand \(\Vert \dot{\textbf{y}}\Vert ^2\) into the right hand side then provides an upper bound for the minimal solution to the variational problem with integrand \(\Vert \dot{\textbf{y}}\Vert \). And substituting the minimal solution for the problem with integrand \(\Vert \dot{\textbf{y}}\Vert ^2\) into the cost with integrand \(\Vert \dot{\textbf{y}}\Vert \) provides an upper bound on the cost of the global minimizer for that problem. In Sect. 2, it will be shown that this second upper bound is in fact the global minimum.

In other words, changing the original problem by an appropriate choice of \(\Phi \) can preserve the E–L necessary conditions while making it easy to guarantee global optimality of the solution to the new problem. Though it is not obvious using elementary arguments that the global optimality of the new problem guarantees it for the original problem, such statements are made in differential geometric settings. This issue will be revisited without geometry in Sect. 2. But first a small detour will be taken to discuss variational problems for which the resulting optimal trajectories lack regularity (i.e., they are non-Lipschitz), and for which the E–L equations are not necessary conditions.

1.2 The Lavrentiev phenomenon

Whereas the classical variational calculus of Euler, Lagrange, Jacobi, etc. assumes functionals with sufficiently smooth integrand and well-behaved resulting extremal trajectories, over the past 100 years variational problems with globally minimal solution have been investigated in the absence of classical regularity conditions on the integrand and/or the globally minimal trajectories that result.

The Lavrentiev phenomenon has been of particular interest [11,12,13]. As an example, consider trajectories x(t) that minimize the functional

$$\begin{aligned} J[x] = \int _{0}^{1} (x^3 - t)^2 \dot{x}^6 dt \end{aligned}$$
(8)

subject to boundary conditions \(x(0) = 0\) and \(x(1) = 1\). This is the Maniá example [12]. Though the integrand is smooth, it can be shown that the globally minimal solution is \(x^*(t) = t^{1/3}\). As this is not Lipschitzian, but is absolutely continuous, this is an example where the infimum of the functional taken over the set of absolutely continuous trajectories is lower than that taken over Lipschitz trajectories. This is the essence of the Lavrentiev phenomenon.

Functionals such as (8) and others that demonstrate the Lavrentiev phenomenon, such as those in [14] are remarkable in that their integrand is regular, and yet the resulting trajectories are not [15] Consequently, the E–L equations need not be necessary conditions [16]. This remains an active topic of research with more such examples continuing to emerge [17,18,19].

The following section examines an altogether different kind of variational problem with globally minimal solution. Then in Sect. 3 all of these pieces from previous sections are assembled using a “bootstrapping” procedure in which globally minimal solutions on higher-dimensional spaces are constructed.

2 Nonstandard case studies in global optimality

Though the Euler–Lagrange equations provide necessary conditions for local optimality under the conditions that: (1) f is sufficiently smooth; and (2) a sufficiently smooth solution exists, they provide no guarantee that such a solution will be globally optimal. So-called second-order necessary conditions provide more confidence that the solution to the E–L equations might be minimal, but also provide no guarantee. In the absence of specific a priori knowledge of the properties of the functional, such as convexity, it is usually quite difficult to say anything other than that the solution generated by the E–L equations are extremals (rather than global minima or maxima), unless additional arguments can be employed as in the case of the line in the previous section.

That said, in certain special situations, the structure of the function \(f(\cdot )\) will guarantee global optimality of the solution generated by the the Euler–Lagrange equations. Here special cases are presented in which global optimality of solutions of variational problems are guaranteed when satisfying the Euler–Lagrange necessary conditions. The emphasis here is on this sense of the word ‘global’ in contrast to global descriptions of geometry or dynamics which can be found elsewhere [3, 20].

2.1 Global optimality in 1D case

Consider the set of monotonically increasing bijective functions that map the closed interval [0, 1] into itself and are twice differentiable on the open interval (0, 1) and satisfy the boundary conditions \(y(0) = 0\) and \(y(1) = 1\). Such functions include but are not limited to \(y(x) = x^p\) for any positive power \(p \in \mathbb {R}_{>0}\). The set of all such functions are closed under function inversion and function composition \((y_1 \circ y_2)(x) = y_1(y_2(x))\), and have an identity element, \(y_{id}(x) = x\). The result is an infinite dimensional group. In this section a variational problem is formulated that selects the one element of this group that globally minimizes a cost functional.

There exists a very general functional in the one-dimensional case for which the global minimality of the solution to the Euler–Lagrange equation is guaranteed. Specifically, let

$$\begin{aligned} f(y,y',x) = m(y) (y')^2 \end{aligned}$$

where \(m:[0,1]\,\longrightarrow \,\mathbb {R}_{>0}\) is differentiable and let

$$\begin{aligned} J[y] = \int _{0}^{1} m(y) (y')^2 dx. \end{aligned}$$
(9)

Global minimization of this sort of functional has been examined informally in [21, 22]. A more rigorous treatment is given below.

2.1.1 Globally minimal reparametrization

Lemma 2.1

Let \(m_1\) and \(m_2\) be real numbers with \(0< m_1< m_2 < \infty \) and let \(m:[0,1] \,\rightarrow \, [m_1,m_2]\) be differentiable. Let \(m^{\frac{1}{2}}(s)\) and \(m^{-\frac{1}{2}}(s)\) respectively be shorthand for \(\sqrt{m(s)}\) and \(1/\sqrt{m(s)}\). The solution to the resulting Euler–Lagrange equations for (9) subject to the boundary conditions \(y(0)=0\) and \(y(1)=1\) satisfies the implicit equation

$$\begin{aligned} y^*(x) = \frac{\int _{0}^{x} m^{-\frac{1}{2}}(y^*(s)) ds}{\int _{0}^{1} m^{-\frac{1}{2}}(y^*(s)) ds} \end{aligned}$$
(10)

and \(y^*(x)\) can be obtained explicitly by inverting the function \(F:[0,1]\,\rightarrow \,\left[ \,0,\sqrt{m_2}\,\right] \) defined by

$$\begin{aligned} F(y) \,\doteq \, \int _{0}^{y} m^{\frac{1}{2}}(s) ds \end{aligned}$$

in the expression

$$\begin{aligned} F(y^*(x)) = F(1) \, x. \end{aligned}$$
(11)

Moreover

$$\begin{aligned} \int _{0}^{1} m^{-\frac{1}{2}}(y^*(s))\,ds \,=\, 1/F(1). \end{aligned}$$
(12)

Proof

Recognizing that \(m'=(dm/dy)y'\), the E–L equation corresponding to (9) will be

$$\begin{aligned} m y'' + \frac{1}{2}m' y' = 0 . \end{aligned}$$

Multiplying through by \(m^{-\frac{1}{2}}(x) = 1/\sqrt{m(x)}\), the result then can be written as

$$\begin{aligned} \left( m^{\frac{1}{2}} y'\right) ' = 0 . \end{aligned}$$

The solution is then of the form

$$\begin{aligned} m^{\frac{1}{2}}(y^*) \frac{dy^*}{dx} = c_1 \end{aligned}$$
(13)

where \(c_1\) is a constant of integration, and so

$$\begin{aligned} y^*(x) = c_2 + c_1 \int _{0}^{x} m^{-\frac{1}{2}}(y^*(s)) ds . \end{aligned}$$

Since \(y(0)=0\) and \(y(1)=1\) this becomes (10).

Alternatively,

$$\begin{aligned} m^{\frac{1}{2}}(y^*) {dy^*} = c_1 {dx} \end{aligned}$$

can be integrated to give

$$\begin{aligned} F(y^*) = c_1 x. \end{aligned}$$

The condition \(F(y^*(1)) = c_1\) together with the boundary condition \(y^*(1)=1\) then gives (11). That is, \(c_1 = F(1) \in \left[ \,\sqrt{m_1},\sqrt{m_2}\,\right] \). But since the same \(c_1\) appears in the normalization of the implicit expression, (12) results.

The following theorem proves that the result of the above lemma is not only a solution to the E–L equations, but that it is globally optimal. Moreover, this global optimality generalizes to cases where the E–L equations no longer apply due to a lack of sufficient regularity, i.e., the case when the function m is not differentiable.

Theorem 2.2

Let all of the conditions of Lemma 2.1 hold, except that the function \(m:[0,1]\,\rightarrow \,[m_1,m_2]\) need only be continuous rather than differentiable. Then the implicit solution (10) and explicit solution (11) globally minimize (9) among all differentiable functions subject to the boundary conditions \(y(0) = 0\) and \(y(1) = 1\).

Proof

(10) and (11) both satisfy (13) without envoking the E–L equations. Substituting \(y^*(x)\) into (9) and observing (13) gives

$$\begin{aligned} J[y^*] = \int _{0}^{1} \left( m^{\frac{1}{2}}(y^*) \frac{dy^*}{dx}\right) \left( m^{\frac{1}{2}}(y^*) \frac{dy^*}{dx}\right) dx \,=\, c_1^2 \end{aligned}$$

where \(c_1 = F(1)\). The continuity and boundedness of m ensures that F is bounded, differentiable, and monotonically increasing. This together with the form of (10) or (11) ensures differentiability of \(y^*\). Since \(c_1^2 = \left( \int _{0}^{1} c_1 dx\right) ^2\),

$$\begin{aligned} J[y^*] = \left( \int _0^1 m^{\frac{1}{2}}(y^*(x)) \frac{dy^*}{dx} dx\right) ^2 = \left( \int _0^1 m^{\frac{1}{2}}(y^*) {dy^*}\right) ^2. \end{aligned}$$

The above is simply a change of variables of integration and has nothing to do with the details of (10) other than the fact that \(y^*(0)=0\) and \(y^*(1) =1\) and \({dy^*}/{dx} > 0\), which is true even though m may not be monotonic, convex, or differentiable. Consequently, the value of the integral in the above expression for \(J[y^*]\) is independent of the path \(y^*(x)\). That is,

$$\begin{aligned} J[y^*] = \left( \int _0^1 m^{\frac{1}{2}}(y) {dy}\right) ^2 \end{aligned}$$

can be computed without reference to \(y^*(t)\) since the name of the variable of integration is irrelevant. Using the Cauchy–Schwarz inequality

$$\begin{aligned} \int _{0}^{1} a(x) b(x) dx \,\le \, \left( \int _{0}^{1} |a(x)|^2 dx\right) ^{\frac{1}{2}} \left( \int _{0}^{1} |b(x)|^2 dx\right) ^{\frac{1}{2}} \end{aligned}$$

with \(a(x) = m^{\frac{1}{2}}(y(x)) {dy}/dx\) and \(b(x) = 1\) then gives

$$\begin{aligned} J[y^*] \le J[y] \end{aligned}$$
(14)

for every possible differentiable y(x) that satisfies the boundary conditions.

What is remarkable about this simple example is that none of the classical conditions stated in variational calculus regarding sufficient conditions for optimality apply. This is not a hyperbolic space, the functional is not convex, etc. And in addition to all of this, the regularity conditions on m needed for the E–L equations to hold can be relaxed. Moreover, as will be seen later, from this simple example it is possible to bootstrap up to cases in which y(t) is replaced by a path defined in a multi-dimensional space and where global optimality is still guaranteed.

2.1.2 Reparameterizing an extremal solution only makes things worse

Given \(T = \frac{1}{2} \dot{\textbf{q}}^T M(\textbf{q}) \dot{\textbf{q}}\) and \(V=0\), with optimal solution \(\textbf{q}^*(t)\) that conserves kinetic energy \(T(\textbf{q}^*,\dot{\textbf{q}}^*) = T_0\), it is clear that arclength along such a trajectory is

$$\begin{aligned} s(t) \,=\, \int _{0}^{t} \sqrt{\dot{\textbf{q}}^*(t')^T M(\textbf{q}^*(t')) \dot{\textbf{q}}^*(t')}\, dt' \,=\, \int _{0}^{t} \sqrt{2T_0} \, dt' \,=\, \sqrt{2T_0} \, t. \end{aligned}$$

If we were to attempt to reparameterize by substituting \(\textbf{q}^*(t) \,\rightarrow \, \textbf{q}^*(\tau (t))\), with \(\tau \) being the same sort of monotonically increasing function on [0, 1] discussed earlier, the resulting cost would be

$$\begin{aligned} \int _{0}^{1} \sqrt{\dot{\textbf{q}}^*(\tau (t))^T M(\textbf{q}^*(\tau (t))) \dot{\textbf{q}}^*(\tau (t))} \dot{\tau }^2(t)dt \,=\, \sqrt{2T_0} \int _{0}^{1} \dot{\tau }^2(t) \, dt . \end{aligned}$$

The optimal \(\tau (t)\) in this context is \(\tau ^*(t) = t\) in which case \(\int _{0}^{1} (\dot{\tau }^*)^2(t) \, dt = 1\). For any other \(\tau (t) = t + \epsilon (t)\), the result will be

$$\begin{aligned} \int _{0}^{1} \dot{\tau }^2(t) \, dt \,=\, \int _{0}^{1} (1 + \dot{\epsilon })^2 \, dt \, =\, 1+ \int _{0}^{1} \dot{\epsilon }^2 \, dt , \end{aligned}$$

which only can increase cost.

2.2 Implications for solving auxiliary variational problems

A result of combining Lemma 1.1 and Theorem 2.2 is

Corollary 2.3

Let T be as in (5). Then the trajectory \(\textbf{q}^*(t)\) that minimizesFootnote 3

$$\begin{aligned} J_2[\textbf{q}] \doteq \int _{0}^{1} 2T(\textbf{q},\dot{\textbf{q}}) \, dt \end{aligned}$$

globally subject to the end constraints \(\textbf{q}(0)\) and \(\textbf{q}(1)\) will also minimize

$$\begin{aligned} J_1[\textbf{q}] \doteq \int _{0}^{1} \sqrt{2T(\textbf{q},\dot{\textbf{q}})} \, dt \end{aligned}$$

globally subject to the same end constraints. That is, for any twice differentiable trajectory \(\textbf{q}(t)\) satisfying the boundary conditions

$$\begin{aligned} J_2[\textbf{q}^*] \le J_2[\textbf{q}] \,\,\,\Longrightarrow \,\,\, J_1[\textbf{q}^*] \le J_1[\textbf{q}]. \end{aligned}$$
(15)

Moreover

$$\begin{aligned} J_2[\textbf{q}^*] = (J_1[\textbf{q}^*])^2. \end{aligned}$$
(16)

Proof

If \(\Phi (x) = \sqrt{x}\), then by Lemma 1.1 the E–L equations for both functionals are the same and since the boundary conditions are the same, the resulting solutions will both be \(\textbf{q}^*(t)\).

Let \(\textbf{q}= \textbf{q}(t)\) be an arbitrary differentiable trajectory on [0, 1] that satisfies the end constraints, and let \(\tau (t)\) be a reparameterization of time. Then \((\textbf{q}\circ \tau )(t) = \textbf{q}(\tau (t))\) will also be a feasible trajectory. Suppressing the temporal dependence on \(\tau \), this will be referred to as \(\textbf{q}(\tau )\). Let

$$\begin{aligned} m(\tau ) \,=\, \frac{d\textbf{q}}{d\tau }^T M(\textbf{q}) \,\frac{d\textbf{q}}{d\tau } . \end{aligned}$$

Then \(2T(\textbf{q}(\tau ), \frac{d\textbf{q}}{d\tau } \dot{\tau }) = m(\tau ) \dot{\tau }^2\). If \(\tau ^*\) is the global minimizer of this one-dimensional functional, and if \(\textbf{q}^*(t)\) is the global minimizer of the functional \(J_2\), then

$$\begin{aligned} J_2[\textbf{q}^*] \,\le \, J_2[\textbf{q}(\tau ^*)] \,\le \, J_2[\textbf{q}]. \end{aligned}$$
(17)

Recall that the differentiable trajectory \(\textbf{q}(t)\) is arbitrary up to the same boundary conditions, and \(\textbf{q}(\tau ^*(t))\) is the same path in configuration space parameterized differently by time from \(\textbf{q}(t) = \textbf{q}(\tau (t)=t)\).

From Theorem 2.2, we know that \(\textbf{q}(\tau ^*(t))\) will give

$$\begin{aligned} J_2[\textbf{q}(\tau ^*)] = (J_1[\textbf{q}(\tau ^*)])^2. \end{aligned}$$
(18)

This is true for any differentiable \(\textbf{q}(\tau )\) satisfying the end constraints and evaluated at \(\tau = \tau ^*(t)\). The significance of (18) is that since both \(J_1\) and \(J_2\) are positive, then for any given \(\textbf{q}(\tau )\) they must both be minimized by the same \(\textbf{q}(\tau ^*(t))\).

In the case when \(\textbf{q}= \textbf{q}^*\) (the postulated global minimal solution for \(J_2\)), reparameterizing the temporal variable results in \(\tau ^*(t) = t\) since by definition there is no way to reparameterize to improve the global minimum. Therefore, \(J_2[\textbf{q}^*(\tau ^*)] = J_2[\textbf{q}^*]\).

On the other hand, the value of the integral

$$\begin{aligned} J_1[\textbf{q}(\tau )] = \int _{0}^{1} \sqrt{m(\tau )} \, \dot{\tau } \, dt \,=\, \int _{0}^{1} \sqrt{m(\tau )} \, d\tau \end{aligned}$$

is independent of the choice of the differentiable monotonically increasing bijective function \(\tau :[0,1] \rightarrow [0,1]\) regardless of which path \(\textbf{q}(t)\) is taken. That is, for any valid \(\tau (t)\),

$$\begin{aligned} J_1[\textbf{q}(\tau )] = J_1[\textbf{q}] . \end{aligned}$$

Therefore, \(J_1[\textbf{q}^*(\tau )] = J_1[\textbf{q}^*]\) for every such \(\tau \), including \(\tau ^*\) and

$$\begin{aligned} J_1[\textbf{q}^*] \,\le \, J_1[\textbf{q}(\tau ^*)] \,=\, J_1[\textbf{q}]. \end{aligned}$$
(19)

Moreover, from (18), \(J_1\) is globally minimized on the same global minimizer of \(J_2\), namely \(\textbf{q}^*(t)\) and the minimal values are related as (16).

2.3 The Poincaré half space model

As an example where the E–L equations provide a globally minimal solution, consider the Poincaré Half Space Model, which is a Riemannian space of constant negative curvature.

Given an \(N\times N\) metric tensor \(M(\textbf{q}) = [m_{ij}(\textbf{q})]\) and its inverse using the raised index notation \([M(\textbf{q})]^{-1} = [m^{ij}(\textbf{q})]\), the Christoffel symbols are computed as

$$\begin{aligned} \Gamma ^{i}_{\,jk} \doteq \frac{1}{2} \sum _{l=1}^{N} m^{il} \left( \frac{\partial m_{lj}}{\partial q_k} + \frac{\partial m_{lk}}{\partial q_j} - \frac{\partial m_{jk}}{\partial q_l} \right) \,, \end{aligned}$$
(20)

and from these the Riemannian curvature tensor is computed as

$$\begin{aligned} R^{i}_{\,jkl} \doteq -\frac{\partial \Gamma ^{i}_{\,\,jk}}{\partial q_l} + \frac{\partial \Gamma ^{i}_{\,\,jl}}{\partial q_k} + \sum _{m=1}^{N} (- \Gamma ^{m\,}_{\,\,jk} \Gamma ^{i}_{\,\,ml} + \Gamma ^{m\,}_{\,\,jl} \Gamma ^{i}_{\,\,mk}). \end{aligned}$$
(21)

The notation of raising and lowering indicies gives

$$\begin{aligned} R_{hjkl} = \sum _{i=1}^{N} m_{hi} R^{i}_{\,jkl} . \end{aligned}$$

A space is said to have constant curvature if [23]

$$\begin{aligned} R_{hjkl} \,=\, K_0 \left( m_{hk} m_{jl} - m_{hl} m_{jk}\right) \,. \end{aligned}$$
(22)

Hyperbolic space defined by constant negative curvature at every point is an example where it is known that geodesics connecting any two points are unique and globally length minimizing. The Poincaré half space model of hyperbolic space is the open half space

$$\begin{aligned} \mathbb {H}^+ \,\doteq \, \{\textbf{x}\in \mathbb {R}^N ,\, \textbf{x} \cdot \textbf{e}_N > 0\} \end{aligned}$$

endowed with the metric tensor

$$\begin{aligned} M(\textbf{x}) = (\textbf{x} \cdot \textbf{e}_N)^{-2} \, \mathbb {I}. \end{aligned}$$

For example, in the 2D case this consists of point \((x_1,x_2>0) \in \mathbb {R}^2\) with

$$\begin{aligned} \dot{\textbf{x}}^T M(\textbf{x}) \dot{\textbf{x}} = \frac{\dot{x}_1^2 + \dot{x}_2^2}{x_2^2} . \end{aligned}$$

In this case it can be shown that (22) holds with

$$\begin{aligned} K_0 \,=\, -1. \end{aligned}$$
(23)

The family of geodesics in this case is parameterized by the group \(PSL(2,\mathbb {R}) \,\doteq \, SL(2,\mathbb {R})/\{\mathbb {I},-\mathbb {I}\}\) where

$$\begin{aligned} SL(2,\mathbb {R}) \,=\, \left\{ \left. \left( \begin{array}{cc} a &{}\quad b \\ c &{}\quad d \end{array}\right) \,\right| \, ad - bc \,=\,1\right\} . \end{aligned}$$

Let \(i = \sqrt{-1}\) and let abcd be real numbers satisfying \(ad - bc = 1\). Define the curve in the complex plane

$$\begin{aligned} z^*(t) = \frac{b + ia e^t}{d + i c e^t} = x_1^*(t) + i x_2^*(t) . \end{aligned}$$

That is, \(x_1^*(t) = Re(z^*(t))\) and \(x_2^*(t) = Im(z^*(t))\). (Optimality properties of this curve will be demonstrated shortly, and \(*\) denotes this, i.e., it is not Hermitian conjugation). Explicitly,

$$\begin{aligned} x_1^*(t) = (bd + ac e^{2t})(d^2 + c^2 e^{2t})^{-1} \,\,\,\textrm{and}\,\,\, x_2^*(t) = e^{t} (d^2 + c^2 e^{2t})^{-1}. \end{aligned}$$
(24)

The signed curvature of these curves as viewed in the Euclidean plane, computed using the general formula

$$\begin{aligned} k(t) = \frac{\dot{x}_1 \ddot{x}_2 - \dot{x}_2 \ddot{x}_1}{\left( \dot{x}_1^2 + \dot{x}_2^2\right) ^{3/2}} \end{aligned}$$
(25)

is \(k^*(t) = -2cd\), which is constant, indicating a clockwise bending circular arc. Since these are circles, they can be written as

$$\begin{aligned} (x_1^* - x_0)^2 + (x_2^*)^2 = r^2 \end{aligned}$$
(26)

where \(r = 1/|k^*| = 1/(2cd)\) and

$$\begin{aligned} x_0 = \frac{1}{2}\left( \frac{b}{d} + \frac{a}{c}\right) . \end{aligned}$$

It can be shown that on these circles

$$\begin{aligned} \frac{(\dot{x}_1^*)^2 + (\dot{x}_2^*)^2}{(x_2^*)^2} = 1, \end{aligned}$$
(27)

and consequently t is arclength in the hyperbolic metric.

Moreover, the Euler–Lagrange equations for the functional

$$\begin{aligned} J = \int _{t_0}^{t_1} \frac{\dot{x}_1^2 + \dot{x}_2^2}{x_2^2} dt \end{aligned}$$
(28)

can be written as

$$\begin{aligned} \frac{d}{dt}\left( \frac{\dot{x}_1}{x_2^2}\right) \,= & {} \, 0 \end{aligned}$$
(29)
$$\begin{aligned} \ddot{x}_2 \,= & {} \, x_{2}^{-1} \left( \dot{x}_2^2 - \dot{x}_1^2\right) . \end{aligned}$$
(30)

\(x_i^*(t)\) given above satisfy these equations. Consequently (24) is an extremal of (28). In particular, (29) gives

$$\begin{aligned} \frac{\dot{x}_1^*}{(x_2^*)^2} = 2cd . \end{aligned}$$

From the fact that \({\dot{x}_1^*}/{(x_2^*)^2}\) is constant, it is clear that if \(\epsilon _1(t)\) is a nonzero arbitrary differentiable function that vanishes at \(t=t_0\) and \(t=t_1\), it must be that

$$\begin{aligned} \int _{t_0}^{t_1} \frac{(\dot{x}_1^*)^2 + (\dot{x}_2^*)^2}{(x_2^*)^2} dt < \int _{t_0}^{t_1} \frac{(\dot{x}_1^*+\dot{\epsilon }_1)^2 + (\dot{x}_2^*)^2}{(x_2^*)^2} dt \end{aligned}$$

based on the same argument as for the line. That is, when expanding the term in parenthesis, the resulting linear term in \(\dot{\epsilon }\) integrates to zero.

Moreover, since these curves \(\textbf{x}^*(t)\) are already parameterized by arclength, replacing them with \(\textbf{y}(t) = \textbf{x}^*(\tau (t))\) where \(\tau (t) \ne t\) is monotonically increasing and \(\tau (t_0) = t_0\) and \(\tau (t_1) = t_1\) can only increase the value of J. Consequently,

$$\begin{aligned} \int _{t_0}^{t_1} \frac{(\dot{x}_1^*)^2 + (\dot{x}_2^*)^2}{(x_2^*)^2} dt < \int _{t_0}^{t_1} \frac{\dot{y}_1^2 + \dot{y}_2^2}{y_2^2} dt . \end{aligned}$$

Writing \(\textbf{y}(t) = \textbf{x}^*(t) + \varvec{\nu }(t)\) (where there is one degree of freedom in \(\varvec{\nu }(t)\) following from the freedom of the scalar function \(\tau (t)\)) and combining the above two inequalities provides evidence for the following statement.

If \(\epsilon _1(t)\) and \(\epsilon _2(t)\) are two independent nonzero arbitrary differentiable functions that vanish at \(t=t_0\) and \(t=t_1\), then

$$\begin{aligned} \int _{t_0}^{t_1} \frac{(\dot{x}_1^*)^2 + (\dot{x}_2^*)^2}{(x_2^*)^2} dt < \int _{t_0}^{t_1} \frac{(\dot{x}_1^* + \dot{\epsilon }_1)^2 + (\dot{x}_2^* + \dot{\epsilon }_2)^2}{(x_2^* + \epsilon _2)^2} dt , \end{aligned}$$

indicating that (24) are globally minimal geodesics.

2.4 Special higher dimensional vectorial cases

In the previous subsection, the Poincaré half-plane model was used. Another model of the hyperbolic plane is the Poincaré disk model in which

$$\begin{aligned} f(\textbf{x},\dot{\textbf{x}}) = \dot{\textbf{x}}^T M(\textbf{x}) \dot{\textbf{x}} \,\,\,\textrm{where} \,\,\, M(\textbf{x}) \,=\, \frac{4}{1-\Vert \textbf{x}\Vert ^2} \mathbb {I}. \end{aligned}$$

Extensions of the Poincaré half-plane and disk models to higher dimensions exist. For example, the above expression for \(M(\textbf{x})\) holds in n dimensions. The n-dimensional half-space model has \(M(\textbf{x}) = x_{n}^{-2} \mathbb {I}\). From geometric considerations it is known that geodesics in these spaces are global minimizers of length.

All of the above are of the form

$$\begin{aligned} f(\textbf{x},\dot{\textbf{x}}) = m\left( \sum _{i=1}^{N} a_i x_i^2\right) \sum _{j=1}^{N} b_j \dot{x}_j^2 \end{aligned}$$

where \(b_j = 1\) and \(a_i \in \{0,1\}\). That is, the mass matrix consists of a scalar mass function, \(m(\cdot )\), which depends on coordinates in a particular way, which multiplies the identity matrix. Therefore, functions of the form above are a reasonable canididate for exploring wider conditions on the parameters \(\{a_i\}\) and \(\{b_j\}\) such that global minimality of the variational problem is guaranteed.

Another very special vectorial case is when there are coordinates \(\{{q_i}'\}\) such that

$$\begin{aligned} f(\textbf{q}',\dot{\textbf{q}}',t) = \sum _{i=1}^{N} \lambda _i({q_i}') ({\dot{q}_i}')^2 \end{aligned}$$

where each \(\lambda _i\) is a continuous function on the interval [0, 1] that takes finite positive values. Then global optimization in each coordinate can be performed independently as per Theorem 2.2. Likewise, if a coordinate transformation can reduce a coupled vectorial problem to this form, then the global optimum can be found in each new coordinate, and the result can be transformed back. Such cases are not the norm. That is, if the matrix

$$\begin{aligned} M(\textbf{q}) \,=\, S^T \Lambda (S \textbf{q}) \,S \end{aligned}$$

for some invertible constant matrix S, then the variational problem for \( f(\textbf{q},\dot{\textbf{q}},t) \doteq \dot{\textbf{q}}^T M(\textbf{q}) \,\dot{\textbf{q}} \) will have a globally minimal solution if it is defined on the region \(S \cdot [0,1]^N\) because then in the coordinates \(\textbf{q}' = S \textbf{q}\), everything will reduce to \(f(\textbf{q}',\dot{\textbf{q}}',t)\).

2.5 A suboptimal ansatz

For more complicated variational problems, employing numerical shooting methods to meet boundary conditions can be computationally costly. Moreover, since globally optimal solutions might not even exist, it can sometimes be useful to obtain a fast suboptimal solution to a variational problem rather than an exact extremal.

Motivated by the splitting in Theorem 2.2, we can define a trajectory \(\textbf{q}^{\circ }(t)\) such that for symmetric positive definite \(M(\textbf{q}) \in \mathbb {R}^{n\times n}\) (as in kinetic energy or metric tensor) and \(R(\textbf{q}) \in SO(n)\),

$$\begin{aligned} R^T(\textbf{q}^{\circ }) \, M^{\frac{1}{2}}(\textbf{q}^{\circ }) \, \dot{\textbf{q}}^{\circ } = \textbf{c} \end{aligned}$$

where \(\textbf{c}\) is a constant vector. Then the kinetic energy \(T=\frac{1}{2} \dot{\textbf{q}}^T M(\textbf{q}) \dot{\textbf{q}}\) along this path will remain constant and

$$\begin{aligned} \textbf{q}^{\circ }(t) = \textbf{q}(0) + \int _{0}^{t} M^{-\frac{1}{2}}(\textbf{q}^{\circ }(\tau )) R(\textbf{q}^{\circ }(\tau ),\tau ) \, \textbf{c} \, d\tau . \end{aligned}$$
(31)

Here \(M^{\frac{1}{2}}(\textbf{q})\) is the matrix square root of \(M(\textbf{q})\) such that \(M^{\frac{1}{2}}(\textbf{q})M^{\frac{1}{2}}(\textbf{q}) = M(\textbf{q})\) and \( M^{-\frac{1}{2}}(\textbf{q}) = [M^{\frac{1}{2}}(\textbf{q})]^{-1}\) and \(R(\textbf{q},t) \in SO(n)\) is differentiable trajectory with \(R(\textbf{q}(0),0) = \mathbb {I}\) that can be used to steer the behavior of \(\textbf{q}^{\circ }(t)\) without adversely affecting the cost.

The constant vector \(\textbf{c}\) provides sufficient freedom to match arbitrary \(\dot{\textbf{q}}(0)\) or \(\textbf{q}(1)\), but not both.Footnote 4 For example, when specifying initial conditions,

$$\begin{aligned} \textbf{q}^{\circ }(t) = \textbf{q}(0) + \left( \int _{0}^{t} M^{-\frac{1}{2}}(\textbf{q}^{\circ }(\tau )) R(\textbf{q}^{\circ }(\tau ),\tau ) \, d\tau \right) \, M^{\frac{1}{2}}(\textbf{q}(0)) \, \dot{\textbf{q}}(0). \end{aligned}$$
(32)

from this \(\dot{\textbf{q}}(0)\) can be selected so as to match \(\textbf{q}(1)\) when the matrix in parenthesis is invertible. Of course, it is not possible to specify both \(\dot{\textbf{q}}(0)\) and \(\textbf{q}(1)\) independently.

In general this will not be a solution to the E–L equations for the variational problem with integrand (5), unless \(M(\textbf{q}) = M_0\) is constant. And as such the solution will not be optimal in general. But since the E–L solution has \(T = T_0 = const)\), and

$$\begin{aligned} \frac{1}{2}(\dot{\textbf{q}}^{\circ })^T M(\textbf{q}^{\circ }) \dot{\textbf{q}}^{\circ } \,=\, \frac{1}{2}\dot{\textbf{q}}(0)^T M(\textbf{q}(0)) \dot{\textbf{q}}(0) \,=\, T_0 , \end{aligned}$$

the cost of this ansatz appears not to be so bad. Moreover its structure lends itself to rapid numerical shooting to match the distal boundary condition.

For example, consider the case of geodesics in the Poincaré half plane model. Suppose that the solution were not known and we seek a reasonable approximation. Using this ansatz with \(R = \mathbb {I}\), the result would be

$$\begin{aligned} \dot{x}_1^{\circ } \,=\, c_1 {x}_2^{\circ } \,\,\,\textrm{and}\,\,\, \dot{x}_2^{\circ } \,=\, c_2 {x}_2^{\circ } . \end{aligned}$$

This means that

$$\begin{aligned} {x}_2^{\circ }(t) = {x}_2(0) e^{c_2 t} \end{aligned}$$

which can be back substituted to give

$$\begin{aligned} {x}_1^{\circ }(t) = \alpha + \beta e^{c_2 t} . \end{aligned}$$

Now suppose we want to compare to the exact solution presented in the previous section. One way to compare would be to examine how different the trajectory with the same initial conditions is. However, a more telling comparison is to examine the true solution and ansatz approximation for given boundary conditions. The above solutions are written in terms of boundary conditions as

$$\begin{aligned} {x}_2(t) \,=\, {x}_2(0) e^{t \ln ({x}_2(1)/{x}_2(0))} \end{aligned}$$
(33)

which can be back substituted to give

$$\begin{aligned} {x}_1^{\circ }(t) = \left( x_1(0) - x_2(0) \frac{x_1(1) - x_1(0)}{x_2(1) - x_2(0)}\right) + x_2(0) \frac{x_1(1) - x_1(0)}{x_2(1) - x_2(0)} e^{t \ln ({x}_2(1)/{x}_2(0))}. \end{aligned}$$
(34)

Consider the boundary conditions \(\textbf{x}(0)\) and \(\textbf{x}(1)\) from the solutions in the previous section:

$$\begin{aligned} \textbf{x}^*(0) \,=\, \left( \begin{array}{c} (bd + ac)(d^2 + c^2)^{-1} \\ (d^2 + c^2)^{-1} \end{array}\right) \,\,\,\textrm{and}\,\,\, \textbf{x}^*(1) \,=\, \left( \begin{array}{c} (bd + ac e^2)(d^2 + c^2 e^{2})^{-1} \\ e (d^2 + c^2 e^{2})^{-1} \end{array}\right) . \end{aligned}$$

Though it meets the boundary condtions, the energy for the ansatz is a larger constant than for the geodesics. Introducing the rotation \(R(x_1,x_2,t)\) provides freedom to reduce this energy while still meeting the same boundary conditions. For example, suppose \(R = R(t)\) defined by counterclockwise measured angle \(\theta (t)\). Then

$$\begin{aligned} \frac{\dot{x}_2^{\circ }}{{x}_2^{\circ }} = c_1 \sin \theta (t) + c_2 \cos \theta (t) \end{aligned}$$

or

$$\begin{aligned} {x}_2^{\circ }(t) \,=\, {x}_2^{\circ }(0) \, \exp \left( c_1 \int _{0}^{t} \sin \theta (\tau ) d\tau + c_2 \int _{0}^{t} \cos \theta (\tau ) d\tau \right) . \end{aligned}$$

In the simplest case the rotation has one variable to use to steer the solution: \(\theta (t) = \omega _0 t\), and \({x}_1^{\circ }(t)\) can be obtained numerically.

3 Global optimality and bootstrapping

In the general vectorial case there are many possible paths connecting the initial and final values \(\textbf{q}(0)\) and \(\textbf{q}(1)\). For this reason the argument used in the 1D case earlier cannot generalize and the Euler–Lagrange equation does not in general provide a globally optimal solution. The vectorial extension of the integrand in (9) is of the form \(\dot{\textbf{q}}^T M(\textbf{q}) \dot{\textbf{q}}\). Here \(M(\textbf{q})\) might be the metric tensor for a Riemannian manifold, in which case the solutions of the Euler–Lagrange equations will be equivalent to the equations giving geodesic paths. As is clear even in the simple cases such as the spheres \(S^1\) and \(S^2\), the geodesics connecting any two points are not unique. Moreover, both the shorter or the longer of the two great arcs connecting two points are solutions to the variational problem when the points are not antipodal. And when they are, an infinite number of great arcs connect them, illustrating that shortest paths need not be unique.

From these examples it is clear that global optimality is closely related to the statement of the problem as a boundary value problem. If instead the problem is restated as that of finding the path from \(\textbf{q}(0)\) with specified value of \(\dot{\textbf{q}}(0)\) that minimizes the cost functional \(\int _{0}^{1} f(\textbf{q},\dot{\textbf{q}},t)dt\), then the nonuniqueness of solutions of the Euler–Lagrange equations disappears. In the case of the sphere, even the longer great arc connecting two points then will be globally optimal. In the one-dimensional case with cost of the form \(m(y) (y')^2\) on a domain that is an interval, the paths had only one possible direction to go and so the distinction between the initial-value and boundary-value problems was not important. While this is not true in general for functionals of multi-dimensional vectorial paths \(\textbf{q}(t) \in \mathbb {R}^N\), special cases still do exist in which global optimality can be proved. For example, when \(M(\textbf{q}) = M_0\) is constant, or when \(m_{ij}(\textbf{q}) = m_{i}(q_i) \delta _{ij}\), then when the domain is a product of intervals so that \(\textbf{q}\in [0,1]^n\), then the resulting boundary value problem will be globally minimized by the solution of the Euler–Lagrange equations. Global optimality is verified by evaluating the cost of \(\textbf{q}(t) = \textbf{q}^*(t) + \varvec{\epsilon }(t)\) where \(\varvec{\epsilon }(0) = \varvec{\epsilon }(1) = \textbf{0}\). As with the case of the line, the effect of adding \(\varvec{\epsilon }(t)\) is that the cost will only increase relative to that for \(\textbf{q}^*(t)\).

3.1 Bootstrapping globally optimal Euler–Lagrange solutions

Let

$$\begin{aligned} J_0[\varvec{\theta },\textbf{q}] \,=\, \int _{0}^{1} f_0(\varvec{\theta },\dot{\varvec{\theta }},\textbf{q},\dot{\textbf{q}},t) dt \end{aligned}$$

where

$$\begin{aligned} f_0(\varvec{\theta },\dot{\varvec{\theta }},\textbf{q},\dot{\textbf{q}},t) \,\doteq \, \frac{1}{2}\Vert \dot{\varvec{\theta }} - A(\textbf{q},t)\dot{\textbf{q}}\Vert _W^2, \end{aligned}$$
(35)

\(\textbf{q}\in \mathbb {R}^N\), \({\varvec{\theta }} \in \mathbb {R}^P\), and \(\Vert B\Vert _W^2 = \textrm{tr}(B^T W B)\) is the square of the weighted Frobenius norm with the weighting defined by a given constant positive definite symmetric matrix \(W=W^T >0\).

If \(\textbf{q}(t)\) is fixed in advance, then the globally minimal solution for \(\varvec{\theta }(t)\) is obviously defined by \(\dot{\varvec{\theta }} = A(\textbf{q},t)\dot{\textbf{q}}\). This is true regardless of whether E–L regularity conditions are imposed on A or not.

Alternatively, if boundary conditions on \(\varvec{\theta }\) are imposed and if A does satisfy E–L regularity conditions (e.g., \(A:\mathbb {R}^N \times [0,1] \,\longrightarrow \, \mathbb {R}^{P \times N}\) is twice continuously differentiable), then the E–L equations in \(\varvec{\theta }\) give

$$\begin{aligned} \frac{d}{dt} \left\{ W\left[ \dot{\varvec{\theta }} - A(\textbf{q},t)\dot{\textbf{q}}\right] \right\} \,=\, \textbf{0} \end{aligned}$$

or

$$\begin{aligned} \dot{\varvec{\theta }} - A(\textbf{q},t) \dot{\textbf{q}} \,=\, \textbf{a}, \end{aligned}$$
(36)

where the vector \(\textbf{a}\) is constant. Integrating again gives another constant \(\textbf{b}\) and the solution is \(\varvec{\theta }(t) = \varvec{\theta }_{[\textbf{q}]}^*(t \,|\, \textbf{a},\textbf{b})\) where

$$\begin{aligned} \varvec{\theta }_{[\textbf{q}]}^*(t \,|\, \textbf{a},\textbf{b}) \,\doteq \, \textbf{a}t + \textbf{b} + \int _{0}^{t} A(\textbf{q}(\tau ),\tau ) \dot{\textbf{q}}(\tau ) d\tau . \end{aligned}$$
(37)

Here \(*\) denotes that this is the extremal solution the functional \(J_0[\varvec{\theta },\textbf{q}]\), the subscript denotes that this solution is a functional of \(\textbf{q}\), and the solution is conditioned on boundary conditions specified by \(\textbf{a}\) and \(\textbf{b}\). When no boundary conditions are imposed, \(\varvec{\theta }_{[\textbf{q}]}^*(t \,|\, \textbf{0},\textbf{b})\) makes \(f_0 = 0\), which is the global minimum value and is independent of how the differentiable differentiable \(\textbf{q}(t)\) is chosen. When \(\textbf{a} \ne \textbf{0}\), \(\varvec{\theta }_{[\textbf{q}]}^*(t \,|\, \textbf{a},\textbf{b})\) has sufficient freedom to match boundary conditions and for any fixed given \(\textbf{q}(t)\) this choice of \(\varvec{\theta }(t)\) is again the global minimum since a perturbation of the form \(\varvec{\theta }_{[\textbf{q}]}^*(t \,|\, \textbf{a},\textbf{b}) \,\longrightarrow \, \varvec{\theta }_{[\textbf{q}]}^*(t \,|\, \textbf{a},\textbf{b}) + \varvec{\epsilon }(t)\) with \(\varvec{\epsilon }(0) = \varvec{\epsilon }(1) = \textbf{0}\) will only increase the value of the quadratic functional. But when \(\textbf{a} \ne \textbf{0}\), the choice of \(\textbf{q}(t)\) is not completely free for reasons explained below.

Assuming sufficient regularity,

$$\begin{aligned}{} & {} \frac{d}{dt}\left( \frac{\partial f_0}{\partial \dot{q}_i}\right) - \frac{\partial f_0}{\partial {q_i}} \nonumber \\{} & {} \quad = - \left( \frac{dA}{dt} \textbf{e}_i\right) ^T W(\dot{\varvec{\theta }} - A\dot{\textbf{q}}) - (A \textbf{e}_i)^T W \frac{d}{dt}(\dot{\varvec{\theta }} - A\dot{\textbf{q}}) + \left( \frac{\partial A}{\partial q_i} \dot{\textbf{q}}\right) ^T W(\dot{\varvec{\theta }} - A\dot{\textbf{q}}) .\nonumber \\ \end{aligned}$$
(38)

Substituting in (36) gives

$$\begin{aligned}{} & {} \frac{d}{dt}\left( \frac{\partial f_0}{\partial \dot{q}_i}\right) - \frac{\partial f_0}{\partial {q_i}} \nonumber \\{} & {} \quad = - \left( \frac{dA}{dt} \textbf{e}_i\right) ^T W \textbf{a} \,+\, \left( \frac{\partial A}{\partial q_i} \dot{\textbf{q}}\right) ^T W \textbf{a} . \end{aligned}$$
(39)

Consequently, if \(\textbf{a} = \textbf{0}\), these equations vanish. Otherwise, equating the above expression to zero provides a constraint on \(\textbf{q}(t)\).

That said, the goal in these computations is not to derive E–L Equations in \(\textbf{q}\) since \(J_0[\varvec{\theta },\textbf{q}]\) is a degenerate variational problem (in the sense that the mass matrix is singular). Rather, the above will be used in the “bootstrapping” procedure described below.

Now suppose that

$$\begin{aligned} f_1(\textbf{q},\dot{\textbf{q}},t) = \frac{1}{2}\dot{\textbf{q}}^T M(\textbf{q}) \dot{\textbf{q}} \end{aligned}$$
(40)

is given for which the corresponding Euler–Lagrange equations globally solve a variational problem with \(\textbf{q}(0)\) and \(\textbf{q}(1)\) specified. Here it is shown that this solution then can be used as a seed to generate a globally optimal solution to a larger variational problem. This is referred to here as “bootstrapping”. Define

$$\begin{aligned} f_2(\varvec{\theta },\dot{\varvec{\theta }},\textbf{q},\dot{\textbf{q}},t) = f_1(\textbf{q},\dot{\textbf{q}},t) + f_0(\varvec{\theta },\dot{\varvec{\theta }},\textbf{q},\dot{\textbf{q}},t) \end{aligned}$$
(41)

For example, \(f_1\) could correspond to the situation in Sect. 2.1. This scenario is akin to the “warped product” defined in [24] and “Kaluza-Klein Lagrangian” described in [25], though is more general and does not rely on underlying geometric properties, though the result has geometric implications.

In the following let

$$\begin{aligned} J_1[\textbf{q}] \,\doteq \, \int _{0}^{1} f_1(\textbf{q},\dot{\textbf{q}},t) dt \,\,\,\textrm{and}\,\,\, J_2[\varvec{\theta },\textbf{q}] \,=\, \int _{0}^{1} f_2(\varvec{\theta },\dot{\varvec{\theta }},\textbf{q},\dot{\textbf{q}},t) dt. \end{aligned}$$
(42)

Theorem 3.1

If \(f_1\) is sufficiently regular such that the E–L equations provide extremals, and if \(\textbf{q}^*\) is the global minimizing extremal of \(J_1\), then when boundary conditions are not imposed on \(\varvec{\theta }\), a family of global minimizers of \(J_2\) exists and is of the form \((\textbf{q}^*, \varvec{\theta }_{[\textbf{q}^*]}^*(t \,|\, \textbf{0},\textbf{b}))\) where \(\textbf{q}^*(t)\) is the global minimizer of \(J_1\) subject to specified \(\textbf{q}(0)\) and \(\textbf{q}(1)\) and \(\textbf{b} \in \mathbb {R}^P\) parameterizes the family.

Proof

By definition, in component form

$$\begin{aligned} \frac{d}{dt}\left( \frac{\partial f_1}{\partial \dot{q}_i}\right) - \frac{\partial f_1}{\partial {q_i}} \,=\, {0}, \end{aligned}$$
(43)

or explicitly when \(f_1(\textbf{q},\dot{\textbf{q}},t) = \frac{1}{2}\dot{\textbf{q}}^T M(\textbf{q}) \dot{\textbf{q}}\),

$$\begin{aligned} \textbf{e}_i^T M(\textbf{q}) \ddot{\textbf{q}} \,+\, \sum _k \dot{q}_k \textbf{e}_i^T \frac{\partial M}{\partial q_k} \dot{\textbf{q}} - \frac{1}{2}\dot{\textbf{q}}^T \frac{\partial M}{\partial q_i} \dot{\textbf{q}} \,=\, {0}, \end{aligned}$$

(though restricting to this form is not required in the proof).

Let the solution to the system of equations (43) subject to boundary conditions \(\textbf{q}(0)=\textbf{q}_0\) and \(\textbf{q}(1)=\textbf{q}_1\) be denoted as \(\textbf{q}^*(t)\).

On the other hand, evaluating

$$\begin{aligned} \frac{d}{dt}\left( \frac{\partial f_2}{\partial \dot{\theta }_i}\right) - \frac{\partial f_2}{\partial {\theta _i}} \,=\, {0} \end{aligned}$$

for all values of i simply results in (37).

The Euler–Lagrange equations for the new system corresponding to coordinates \(\{q_i\}\), written in component form, will be

$$\begin{aligned} \frac{d}{dt}\left( \frac{\partial f_2}{\partial \dot{q}_i}\right) - \frac{\partial f_2}{\partial {q_i}} \,=\, {0}. \end{aligned}$$

From this it is clear that all of the \(f_1\) terms disappear when evaluated at \(\textbf{q}= \textbf{q}^*\), and if \(\textbf{a} = \textbf{0}\), then (39) gives that \(\varvec{\theta }_{[\textbf{q}^*]}^*(t \,|\, \textbf{0},\textbf{b})\) causes all other terms to vanish. This means that \((\textbf{q}^*(t), \varvec{\theta }_{[\textbf{q}^*]}^*(t \,|\, \textbf{0},\textbf{b})\) is an E–L extremizer of \(J_2\).

The global minimality of the solutions \((\textbf{q}^*(t), \varvec{\theta }_{[\textbf{q}^*]}^*(t \,|\, \textbf{0},\textbf{b})\) is guaranteed by the postulated condition that \(\textbf{q}^*(t)\) is obtained a priori, and the global minimality of \(\varvec{\theta }_{[\textbf{q}^*]}^*(t \,|\, \textbf{0},\textbf{b})\) in (37) as a solution to a quadratic cost problem. That is, if \(\textbf{q}^*(t) \,\rightarrow \, \textbf{q}^*(t) + \varvec{\epsilon }_1(t)\) it will only increase the cost of \(J_1\) without affecting the cost of \(J_0\), since \(\varvec{\theta }^*(t)\) is keyed to changes in \(\textbf{q}(t)\). Therefore, if \(\varvec{\theta }(t) = \varvec{\theta }_{[\textbf{q}^* + \varvec{\epsilon }_1]}^*(t \,|\, \textbf{0},\textbf{b}) + \varvec{\epsilon }_2(t)\) is substituted into the cost function, \(\varvec{\epsilon }_1(t)\) will have no effect on \(J_0\), which will remain zero, and the addition of \(\varvec{\epsilon }_2(t)\) only increases the cost from zero to \(\int _{0}^{1} \Vert \dot{\varvec{\epsilon }}\Vert _W^2 dt\). This does not depend on \(\Vert \dot{\varvec{\epsilon }}\Vert _W\) being small. (This is the same as the argument in (7) but with \(\textbf{a}=\textbf{0}\).) That is, the \(f_0\) term in \(f_2\) is uniquely zero when \(\textbf{a} = \textbf{0}\), and there is no way to do better.\(\square \)

Note that when \(\textbf{a} = \textbf{0}\) it is only possibly to specify boundary conditions on \(\varvec{\theta }^{\,*}\) at one end. If \(\textbf{a} \ne \textbf{0}\) more freedom is allowed, but in general \(\textbf{q}^*\) will no longer satisfy the E–L equations for \(J_2\) due to the nonzero terms contributed from \(J_0\). However, in some special cases those \(J_0\) terms vanish, as described below.

Theorem 3.2

Two special cases for which global minimality is guaranteed without making the restriction \(\textbf{a} = \textbf{0}\) are: (1) When \(A(\textbf{q},t) = A_0\) is constant; (2) When \(N = 1\) and \(A(q,t) = \textbf{a}_1(q) \in \mathbb {R}^{P \times 1}\) with q being scalar. The globally minimal solutions in these two cases are respectively \((\textbf{q}^*, \varvec{\theta }_{\textbf{q}^*}^{*}(t\,|\,\textbf{a},\textbf{b}))\) and \((q^*,\varvec{\theta }_{[q^*]}^{*}(t\,|\,\textbf{a},\textbf{b}))\) whereFootnote 5

$$\begin{aligned} \varvec{\theta }_{\textbf{q}^*}^{*}(t\,|\,\textbf{a},\textbf{b}) \,\doteq \, \textbf{a}t + \textbf{b} + A_0\left( \textbf{q}^*(t) - \textbf{q}^*(0)\right) \end{aligned}$$
(44)

and

$$\begin{aligned} \varvec{\theta }_{[q^*]}^{*}(t\,|\,\textbf{a},\textbf{b}) \,\doteq \, \textbf{a}t + \textbf{b} + \int _{0}^{t} \textbf{a}_1(q^*(s)) \dot{q}^*(s)ds \end{aligned}$$
(45)

satisfies the expanded E–L equations corresponding to (41) and can satisfy all boundary conditions \(\varvec{\theta }(0)\) and \(\varvec{\theta }(1)\) in addition to those in \(\textbf{q}\) or q.

Proof

If \(\textbf{a} \ne \textbf{0}\) in (36) then the necessary conditions established by the E–L equations become (39). If \(A(\textbf{q}) = A_0\) is constant, then the terms in (39) vanish and the case 1 solution results. This condition also can be satisfied if the dimension of \(\textbf{q}\) is 1, wherein both terms in parenthesis in (39) become the same and hence cancel. In this second case, denote \(A = \textbf{a}_1\), since it consists of one column vector, gives (45).

Global optimality is ensured as before, by introducing a perturbation \(\varvec{\epsilon }(t)\) of any size satisifying the boundary conditions \(\varvec{\epsilon }(0) = \varvec{\epsilon }(1) = \textbf{0}\).

The class of problems addressed in this section can be viewed in a slightly different way when (40) holds by rewriting (41) as

$$\begin{aligned} f(\textbf{q},\varvec{\theta },\dot{\textbf{q}},\dot{\varvec{\theta }},t) = \frac{1}{2}\left[ \begin{array}{c} \dot{\textbf{q}} \\ \dot{\varvec{\theta }} \end{array}\right] ^T \left( \begin{array}{cc} M(\textbf{q}) \,+\, A^T(\textbf{q},t) W A(\textbf{q},t) &{}\quad A^T(\textbf{q},t) W \\ W A(\textbf{q},t) &{}\quad W \end{array}\right) \left[ \begin{array}{c} \dot{\textbf{q}} \\ \dot{\varvec{\theta }} \end{array}\right] . \end{aligned}$$
(46)

Consequently, a quadratic cost that can be decomposed in this way will have solutions that inherit the global optimality from the original smaller problem under appropriate boundary conditions. It also means that the above reasoning can be used recursively. If the matrix in (46) is called \(M'(\textbf{q})\) and we define \(\textbf{q}' \doteq [\textbf{q}^T, \varvec{\theta }^T]^T\), then the problem can be expanded further by adding an additional terms such as \(\Vert \dot{\varvec{\phi }} - A'(\textbf{q}',t) \dot{\textbf{q}}'\Vert _{W'}^{2}\).

3.2 Bootstrapping globally optimal solutions that lack regularity

If the functional \(f_1\) or the resulting global optimizers lack sufficient regularity for the E–L equations to be applicable, the bootstrapping process can nevertheless be used in some cases to generate globally optimal solutions on higher dimensional spaces.

For example, if \(f_1(q,\dot{q}) = m(q) \dot{q}^2\) and m is as in Theorem 2.2, then the globally minimal solution of the variational problem with functional

$$\begin{aligned} f_2(q,\dot{q}, \theta , \dot{\theta }) = m(q) \dot{q}^2 + [\dot{\theta } - a(q,t) \dot{q}]^2 \end{aligned}$$

with \(t \in [0,1]\) and \(a:\mathbb {R}\times [0,1] \rightarrow \mathbb {R}\) continuous and bounded will be \(q^*(t)\) from Theorem 2.2, and \(\theta ^*(t) = a_0t + b_0 + \int _{0}^{t} a(q^*(s),s)\dot{q}^*(s)ds\). This is true even if m is not differentiable, and hence lacks the regularity to use the E–L equations. Moreover, in cases such as this where q and \(\theta \) are both one-dimensional, when \(a=a(q)\) is autonomous, the integral defining \(\theta ^*\) can be rewritten as

$$\begin{aligned} \theta ^*(t) = a_0t + b_0 + \int _{q^*(0)}^{q^*(t)} a(q) dq, \end{aligned}$$
(47)

making it an explicit function of \(q^*(t)\).

As a second example, we can bootstrap off of the Maniá example of the Lavrentiev phenomenon in (8) and seek a global minimum of

$$\begin{aligned} J[x,\theta ] = \int _{0}^{1} \left\{ (x^3 - t)^2 \dot{x}^6 + (\dot{\theta } - a(x,t) \dot{x})^2 \right\} dt \end{aligned}$$
(48)

subject to boundary conditions \(x(0) = 0\) and \(x(1) = 1\). We can simply write \(x^*(t) = t^{1/3}\) and \(\theta ^*(t) = a_0 t + b_0 + \int _{0}^{t} a(x^*(s),s) \dot{x}^*(s) ds\) for a wide variety of functions a, as long as the integral defining \(\theta ^*(t)\) exists and is finite.

That said, most of the applications that follow involve bootstrapping in the case when the paths and functional are regular.

4 Examples from the theory of framed space curves

In the classical theory of curves in three-dimensional space, the Frenet–Serret Apparatus is used to frame arclength-parametrized curves. In the first subsection of this section, a brief review of that classical theory is given, adapted from Chapter 5 of the author’s book [22]. This material is classical (dating back 150 years) and can be found stated using other notation in [9, 26]. The material is presented here primarily because subtle differences in the notation used here relative to the standard presentation will be convenient later.

4.1 The Frenet–Serret apparatus

Without loss of generality, every curve can be parameterized by its arclength, s. Associated with every three-times-differentiable arclength-parameterized curve, \(\textbf{x}(s) \in \mathbb {R}^3\), are two functions: curvature, \(\kappa (s)\), and torsion, \(\tau (s)\). These functions completely describe the shape of the curve up to arbitrary rigid-body motion.

As long as \(\kappa (s) \ne 0\), a unique Frenet frame can be attached at every point on the curve. The orientation of this frame relative to the world frame will be

$$\begin{aligned} {R}_{FS}(s) \,\doteq \, [\textbf{t}(s), \textbf{n}(s), \textbf{b}(s)] \in SO(3) \end{aligned}$$

where \(\textbf{t}(s)\), \(\textbf{n}(s)\), and \(\textbf{b}(s)\) are respectively the unit tangent, normal, and binormal. The Frenet frames, the curvature, and the torsion, together are known as the Frenet–Serret apparatus.

The way that the orientation of a Frenet frame evolves along the curve is tied to the curvature and torsion through the system of first order differential equationsFootnote 6

$$\begin{aligned} \frac{d{R}_{FS}}{ds} = - {R}_{FS} {\Omega _{FS} } \end{aligned}$$
(49)

where \({\Omega _{FS}}\) is a special skew-symmetric matrix with only two independent nonzero entries

$$\begin{aligned}{\Omega _{FS}(s) } \,=\, \left( \begin{array}{ccc} 0 &{}\quad \kappa (s) &{}\quad 0 \\ -\kappa (s) &{}\quad 0 &{}\quad \tau (s) \\ 0 &{}\quad -\tau (s) &{}\quad 0 \end{array} \right) . \end{aligned}$$

The Frenet frame at each value of arclength is then the rotation-translation pair \(({R}_{FS}(s), \textbf{x}(s)) \in SO(3) \times \mathbb {R}^3\). No group law is defined. Later in this article several possibilities are explored.

4.2 Minimally varying frames

The classical Frenet frames are not the only way to attach reference frames to space curves, as pointed out by Bishop [27]. Moroever, the Frenet frames are not optimal in the sense of twisting as little as possible around the tangent as the curve is traversed. The word ‘twist’ is somewhat problematic, as it can be confused with torsion. Whereas torsion is a twisting of the curve, the twisting of the frame attached to the curve does not affect the curve shape. For this reason, the word “roll” can be used in place of “twist”, which is consistent with the terminology used to describe motions of aircraft and ships.

Two ways to modify Frenet frames are described here, which demonstrate the variational formulation from earlier in this paper. First the optimal amount of roll along the tangent relative to the Frenet frames is established. Next, alternative parameterizations in place of arclength of the curve that have optimality properties are examined. Finally, the optimal set of frames and reparameterization are combined, which represents an example of the bootstrapping described previously.

Prior to conceiving the general concept of bootstrapping globally optimal variational problems, the author considered special cases that arose from the field of robotics [28, 29], as reviewed in [21, 22], from which some of this presentation is adapted.

4.2.1 Minimally rolling frames

Consider the Frenet frames as a start, and allow freedom to roll around the tangent so that the new frames have orientation

$$\begin{aligned} R = R_{FS}\, R_1(\theta )\,\,\,\textrm{where}\,\,\, R_1(\theta ) \,=\, \left( \begin{array}{ccc} 1 &{}\quad 0 &{}\quad 0 \\ 0 &{}\quad \cos \theta &{}\quad -\sin \theta \\ 0 &{}\quad \sin \theta &{}\quad \cos \theta \end{array} \right) . \end{aligned}$$

The optimal \(\theta (s)\) will minimize the functional

$$\begin{aligned} J = \frac{1}{2}\int _{0}^{1} \textrm{tr}\left( \frac{dR}{ds} \frac{dR}{ds}^{T}\right) \,ds. \end{aligned}$$
(50)

Computing the derivatives and expanding out gives the explicit expression

$$\begin{aligned} \frac{1}{2}\textrm{tr}\left( \frac{dR}{ds} {\frac{dR}{ds}}^{T}\right) = \kappa ^2 + \tau ^2 + 2 \frac{d\theta }{ds} \textbf{e}_1 \cdot {\varvec{\omega }}_{FS} + \left( \frac{d\theta }{ds}\right) ^2 \end{aligned}$$
(51)

where

$$\begin{aligned} {\varvec{\omega }}_{FS} \,\doteq \, \left( \begin{array}{c} \tau \\ 0 \\ \kappa \end{array} \right) . \end{aligned}$$

This is obtained by using the properties of the trace. For example, \(\textrm{tr}\left( \frac{dR}{ds} {\frac{dR}{ds}}^{T}\right) = \textrm{tr}\left( R^T \frac{dR}{ds} {\frac{dR}{ds}}^{T} R\right) \) which is the trace of the product of two skew-symmetric matrices. And for skew-symmetric matrices \(\Omega _1\) and \(\Omega _2\)

$$\begin{aligned} \frac{1}{2}\textrm{tr}\left( \Omega _1 \Omega _2 \right) = {\varvec{\omega }}_1 \cdot {\varvec{\omega }}_2 \end{aligned}$$

where \(\varvec{\omega }_i\) is the unique vector such that

$$\begin{aligned} \varvec{\omega }_i \times \textbf{v} \,=\, \Omega _i \textbf{v} \end{aligned}$$

for every \(\textbf{v} \in \mathbb {R}^3\). This relationship between \(\varvec{\omega }_i\) and \(\Omega _i\) is denoted as

$$\begin{aligned} {\varvec{\omega }}_i \,=\, (\Omega _i)^{\vee } \,\,\,\textrm{and}\,\,\, \Omega _i \,=\, \hat{\varvec{\omega }}_i\,. \end{aligned}$$
(52)

From (51) and the expression for \({\varvec{\omega }}_{FS}\) it is clear that

$$\begin{aligned} \frac{1}{2}\,\textrm{tr}\left( \frac{dR}{ds} {\frac{dR}{ds}}^{T}\right) = \kappa ^2 + \left( \tau + \frac{d\theta }{ds}\right) ^2, \end{aligned}$$
(53)

and the cost functional is minimized when

$$\begin{aligned} \frac{d\theta }{ds} = -\tau . \end{aligned}$$

Though this had been known for many decades as explained by Bishop [27], it became known to the author as part of studies on snakelike robot arms [28, 29]. In that work the goal was to use a “backbone curve” to capture the overall shape of a physical robot made from joints and links, and since each section of the robot was commanded to adhere to a section of the curve, by limiting the amount of change in position and orientation between reference frames at the end of each section of curve it was easier for the robot to conform.

The above defines frames that have globally minimal roll. In some applications it is required to satisfy conditions at the end points. Such solutions will be of the form

$$\begin{aligned} \theta ^*(s) = c_1 + c_2 s - \int _{0}^{s} \tau (\sigma )\,d\sigma \end{aligned}$$

where \(c_1\) and \(c_2\) are determined by the end conditions \(\theta (0) = \theta _0\) and \(\theta (1) = \theta _1\).

The fact that this is globally optimal follows from the same arguments as in the example of the line (i.e., let \(\theta (s) = \theta ^*(s) + \epsilon (s)\) and show that this can only make things worse).

4.2.2 Optimal reparameterization for least variation

Within the same motivating snakelike robot application, it was reasoned that if the spacing along the curve was relaxed from being arclength, then this would provide additional freedom for the physical robot to conform to the backbone curve. In analogy with how the Frenet frames are not the only (or perhaps not even the most desirable) orientations to attach to a curve, arclength may not be the best parameter to use. In optimal reparameterization, the idea is to no longer use the arclength, s, but instead use a new curve parameter t such that s(t) satisfies \(s(0) =0\) and \(s(1) = 1\) such that a cost functional of the form

$$\begin{aligned} J = \int _{0}^{1} \left\{ \frac{1}{2}\,r^2\, \textrm{tr}\left( \frac{d R(s(t))}{dt} \frac{d R^{T}(s(t))}{dt}\right) + \left( \frac{d s}{dt}\right) ^2 \right\} \,dt \end{aligned}$$

is minimized. The constant r is used to introduce units of length because the units to evaluate bending are measured in angular units whereas extension is measured in units of length. For example, if a ball of radius r and unit mass is traversing this trajectory, J is a measure of the total rotational and translational kinetic energy of the ball integrated over the trajectory. If it speeds up or slows down more quickly than the optimal solution, this cost will increase. This idea will be generalized later in the paper.

The integrand in the above functional is of the form \( m(s) ({s'})^2\) where \(s' = ds/dt\) and

$$\begin{aligned} m(s) = \frac{1}{2}\,r^2\, \textrm{tr}\left( \frac{d R}{ds} \frac{d R^{T}}{ds}\right) + 1, \end{aligned}$$

and therefore the 1D globally optimal solution discussed previously in Theorem 2.2 applies. Call this solution \(s^*(t)\).

4.3 Simultaneous optimal reparameterization and roll modification

In the physical robotics problem discussed earlier, it was desirable to combine the optimal roll and curve reparametrization results. This can be done by first solving for the optimal roll and setting R(s), followed by performing the optimal curve reparametrization, resulting in the frame \((R_{FS}(s^*(t)) R_1(\theta ^*(s^*(t))),\textbf{x}(s^*(t))\). A natural question to ask is if this composition of optimal solutions to subproblems optimally solves the larger composite problem. In general one would expect composition of optimal solutions to smaller problems to be suboptimal on the larger space. But in the case of optimal curve reparametrization and rolling, the directions are in a sense orthogonal and commuting (local rolling and extension about/along the same vector).

Moreover, this problem of simultaneous curve reparameterization and optimal roll distribution serves as a concrete application of the class of bootstrapping problems discussed earlier in this paper. Start with a curve initially parameterized by arclength, \(\textbf{x}(s)\) for \(s \in [0,1]\), and with the Frenet frames \([\textbf{x}(s), R_{FS}(s)]\). We seek a new set of smoothly evolving reference frames of the form \([\textbf{x}(t), R(t)] = [\textbf{x}(s(t)), R_{FS}(s(t)) R_1(\theta (s(t)))]\), where \(R_1(\theta )\) is an added roll, of the Frenet frames about the tangent together with a reparameterization \(s=s(t)\).

A cost function in the two variables \((s(t),\theta (t))\) can be constructed of the form

$$\begin{aligned} C\doteq & {} \frac{1}{2}\int _{0}^{1} \left\{ \frac{1}{2}r^2 \textrm{tr}(\dot{R} \dot{R}^T) + \dot{\textbf{x}} \cdot \dot{\textbf{x}} \right\} dt \end{aligned}$$
(54)
$$\begin{aligned}= & {} \frac{1}{2}\int _{0}^{1} \left\{ (r^2 \kappa ^2(s) + 1) \dot{s}^2 + r^2(\tau (s) \dot{s} + \dot{\theta })^2 \right\} dt. \end{aligned}$$
(55)

This functional is the same form as in (41) with \(\theta \) taking the place of \(\varvec{\theta }\) and s taking the place of \(\textbf{q}\). If the second term in the integral is zero, then the remaining term is a one-dimensional variational problem with \(f(s,\dot{s},t) = \frac{1}{2}m(s) (\dot{s})^2\) for which a globally minimal solution can be obtained. From the previous discussion about bootstrapping, Theorem 3.2 guarantees the global optimality of the composite problem.

What is also interesting here is that the two problems (optimal roll distribution for an arclength-parameterized curve and optimal curve reparametrization) can be be considered sequentially and then merged, and this merged solution will be the same as the joint globally optimal one. Moreover, this example serves as a model for far more general problems with these characteristics.

5 Global optimality in the Lie group setting

Let G be a real matrix Lie group. The dimension of G as a manifold is denoted as N, and the constituent matrices, \(g \in G\), are elements of \(\mathbb {R}^{n\times n}\).

The variational calculus problem on matrix Lie groups can be formulated as that of finding paths \(g(t) \in G\) that extremize a functional

$$\begin{aligned} J = \int _{t_1}^{t_2} f(g;\varvec{\xi };t)dt \end{aligned}$$
(56)

where \(\varvec{\xi }= (g^{-1}\dot{g})^{\vee }\) is a kind of velocity, which is computed simply as the matrix product of \(g^{-1}\) and \(\dot{g}\), the latter of which is not an element of G. It can be shown that \(g^{-1}\dot{g} \in \mathcal{G}\), the Lie algebra corresponding to G, which has N basis elements \(\{E_i\}\), each of which are in \(\mathbb {R}^{n\times n}\). This means that

$$\begin{aligned} g^{-1}\dot{g} \,=\, \sum _{i=1}^{N} \xi _i E_i \end{aligned}$$

and a vector \(\varvec{\xi }= [\xi _1,...,\xi _N]^T \in \mathbb {R}^N\) can be assigned to each element of \(\mathcal{G}\). The notation \(\vee \) takes elements of \(\mathcal{G}\) and returns vectors in \(\mathbb {R}^N\), and the opposite is the \(\hat{(\cdot )}\) which converts \(\varvec{\xi }\in \mathbb {R}^N\) into \(\sum _{i=1}^{N} \xi _i E_i \in \mathcal{G}\). An example of this was demonstrated for \(\mathcal{G} = so(3)\) in (43). The choice of \(\{E_i\}\) with the definition \(E_i^{\vee } = \textbf{e}_i\) induces an inner product on \(\mathcal{G}\)

$$\begin{aligned} (E_i,E_j) = \textbf{e}_i \cdot \textbf{e}_j = \delta _{ij} \end{aligned}$$

and fixes the metric for G.

Each \(g_i(t) \doteq \exp (t E_i)\) is a curve in G defining one parameter subgroups, where \(\exp (\cdot )\) is the matrix exponential. These curves are geodesics relative to the metric for which \((E_i,E_j) = \delta _{ij}\).

The necessary conditions for optimality in this setting are the Euler–Poincaré equations, which can be written as [21, 30,31,32]

$$\begin{aligned} \frac{d}{dt}\left( \frac{\partial f}{\partial \xi _i}\right) + \sum _{j,k=1}^{N} \frac{\partial f}{\partial \xi _k} C_{ij}^{k} \, \xi _j = \tilde{E}_{i} f. \end{aligned}$$
(57)

where \(C_{ij}^{k}\) are the structure constants for the Lie algebra \(\mathcal{G}\) relating the Lie bracket of basis elements as a linear combination of basis elements

$$\begin{aligned}{}[E_i,E_j] = \sum _{k=1}^{N} C_{ij}^{k} E_k . \end{aligned}$$

For a function \(f:G \rightarrow \mathbb {R}\),

$$\begin{aligned} (\tilde{E}_{i} f)(g) \,\doteq \, \left. \frac{d}{ds} f(g \circ \exp (s E_i))\right| _{s=0} \end{aligned}$$

is a directional derivative in the direction of the \(i^{th}\) one-dimensional subgroup. Throughout this section, exp denotes the matrix exponential, which when applied to Lie algebra elements produces Lie group elements. (In (57) \(f:G\times \mathcal{G} \times \mathbb {R}_{\ge 0} \,\rightarrow \, \mathbb {R}\), but the \(\tilde{E}_{i}\) is essentially a partial derivative, hence only the dependence on G is pertinent).

A version of the above Euler–Poincaré equation with additional end constraints imposed with Lagrange multipliers can be found in [33, 34] in the context of modeling DNA loops and thin steerable elastic tubes proposed for medical procedures.

5.1 Globally minimal solutions of the Euler–Poincaré equation

Consider the integrand of a cost functional

$$\begin{aligned} f(g,\varvec{\xi },t) = \frac{1}{2}(\varvec{\xi }- \textbf{c})^T K (\varvec{\xi }- \textbf{c}) = \frac{1}{2}\sum _{i,j=1}^{N} K_{ij} (\xi _i - c_i) (\xi _j - c_j) \end{aligned}$$

with constant \(K = K^T >0\), the absolute minimal value is obtained when \(\varvec{\xi }= \textbf{c}\). The corresponding globally minimal trajectory on the group is then defined by

$$\begin{aligned} g^{-1} \dot{g} \,=\, \sum _{i=1}^{N} c_i E_i . \end{aligned}$$

If \(\textbf{c}\) is a constant vector, then

$$\begin{aligned} g(t) \,=\, g(0) \exp \left( t \sum _{i=1}^{N} c_i E_i\right) . \end{aligned}$$

However, this solution is not able to satisfy arbitrary distal end condtions, such as specifying g(1). If the goal is to do that, then the Euler–Poincaré equations of the form

$$\begin{aligned} \frac{d}{dt}\left( \sum _{j=1}^{N} K_{ij} (\xi _j - c_j) \right) + \sum _{j,k=1}^{N} \left( \sum _{l=1}^{N} K_{kl} (\xi _l - c_l) \right) C_{ij}^{k} \xi _j = 0 \end{aligned}$$
(58)

need to be satisfied. In general, this requires numerical solution, but in some cases things simplify. In particular, consider the case when \(\textbf{c} = \textbf{0}\) and let

$$\begin{aligned} S_{lj}^{i} \doteq \sum _{k=1}^{N} K_{kl} C_{ij}^{k}. \end{aligned}$$
(59)

If \(S_{lj}^{i} = -S_{jl}^{i}\) then \(\sum _{j,l=1}^{N} S_{lj}^{i} \xi _l \xi _j = 0\) and (58) reduces to

$$\begin{aligned} K \dot{\varvec{\xi }} = \textbf{0} \,\, \Longrightarrow \,\, \varvec{\xi }(t) = \varvec{\xi }(0) \,\, \Longrightarrow \,\, g(t) = g(0) \circ e^{t \hat{\varvec{\xi }}(0)}. \end{aligned}$$
(60)

The shortest path connecting \(g(0) = g_0\) and \(g(1) = g_1\) is then

$$\begin{aligned} {g^*(t) = g_0 \circ \exp (t \cdot \log (g_{0}^{-1} \circ g_1))} \end{aligned}$$
(61)

where log is the matrix logarithm. However, if \(S_{lj}^{i} \ne -S_{jl}^{i}\) then the path generated by solving the E–P variational equations is generally not this geometric one. For example, when \(G=SO(3)\) and when the moment of inertia is not isotropic, the optimal path between rotations can become very complicated. This is relevant to satellite attitude reorientation problems, as discussed in [35, 36].

5.2 The rotation group SO(3) and the sphere \(\mathbb {S}^2\)

5.2.1 Euler–Poincaré Equations for SO(3)

For the case when \(G=SO(3)\) group elements are \(3\times 3\) special orthogonal (rotation) matrices, denoted as R, and the Lie algebra elements are skew symmetric matrices. In particular \(R^T \dot{R} = \Omega \) corresponds to the body-fixed description of angular velocity, \(\varvec{\omega }\), and the notation in (52) is used, i.e., \(\varvec{\omega }= \Omega ^{\vee }\) and \(\Omega = \hat{\varvec{\omega }}\).

Minimizing a functional of the form

$$\begin{aligned} J = \frac{1}{2}\int _{t_1}^{t_2} (\varvec{\omega }-\varvec{\omega }_0)^T \mathcal{I} (\varvec{\omega }-\varvec{\omega }_0) dt \end{aligned}$$
(62)

arises in several applications. For example, when \(\varvec{\omega }_0 = \textbf{0}\) this corresponds to minimizing kinetic energy due to rotation where \(\mathcal{I}\) is the moment of inertia matrix. The Euler–Poincaré equations corresponding to the above functional are

$$\begin{aligned} \mathcal{I} \dot{\varvec{\omega }} \,+\, \varvec{\omega }\times \mathcal{I} (\varvec{\omega }-\varvec{\omega }_0) \,=\, \textbf{0}, \end{aligned}$$
(63)

and these become the classical Euler equations of motion without applied moments when \(\varvec{\omega }_0 = \textbf{0}\).

Trajectory planning on SO(3) is both of fundamental importance as an example of more general geometric methods as in [37,38,39,40,41,42,43], as well as practical and elegant engineering and computer graphics applications such as motion interpolation [44, 45] and spacecraft attitute reorientation [35, 36]. In the subsections that follow, specific variational solutions are examined, and their minimality properties are queried.

5.2.2 Geodesics on SO(3)

The situation in (60) occurs for the functional in (62) in the special case when \(\mathcal{I} = \mathbb {I}\) and \(\varvec{\omega }= \textbf{0}\) and (61) applies. That is, the geodesic path connecting \(R_0,R_1 \in SO(3)\) with \(\textrm{tr}(R_1^T R_2) > -1\) (so that they are not rotations related by a rotation of angle \(\pi \)) is

$$\begin{aligned} {R^*(t) = R_0 \circ \exp (t \cdot \log (R_0^T R_1)).} \end{aligned}$$
(64)

Furthermore, this path can be shown to be globally optimal in this case because of the structure of the cost function. To verify this statement, suppose that \(R^*(t)\) is replaced with \(R(t) = R^*(t) Q(t)\) where Q(t) is an arbitrary differentiable path in G with end points \(Q(0)=Q(1) = e\). Then

$$\begin{aligned} R^{T} \dot{R} \,=\, (R^* Q)^{T} \{\dot{R}^* Q + R^* \dot{Q}\} \,=\, Q^{T} \left( (R^*)^{T} \dot{R}^*\right) Q + Q^{T} \dot{Q} . \end{aligned}$$

With \(\varvec{\omega }\,\doteq \, \left( R^{T} \dot{R}\right) ^{\vee }\), \( \varvec{\omega }^* \,=\, \left( (R^*)^{T} \dot{R}^*\right) ^{\vee }\), and \(\varvec{\omega }_Q \,\doteq \, \left( Q^{T} \dot{Q}\right) ^{\vee }\) we have

$$\begin{aligned} \varvec{\omega }\,=\, Q^T \varvec{\omega }^* + \varvec{\omega }_Q . \end{aligned}$$

Then

$$\begin{aligned} \Vert \varvec{\omega }\Vert ^2 = \left\| \varvec{\omega }^*\right\| ^2 + 2 (\varvec{\omega }^*)^T Q \varvec{\omega }_Q + \left\| \varvec{\omega }_Q\right\| ^2 \end{aligned}$$

Note that \(\varvec{\omega }^*\) is constant and \(Q \varvec{\omega }_Q = (\dot{Q} Q^T)^{\vee } = \varvec{\omega }_Q^s\) (space-fixed description of angular velocity corresponding to Q(t)). Therefore

$$\begin{aligned} \int _{0}^{1} (\varvec{\omega }^*)^T Q \varvec{\omega }_Q dt = (\varvec{\omega }^*)^T \int _{0}^{1} \varvec{\omega }_Q^s dt . \end{aligned}$$

For small rotations Q(t) that do not move far from the identity for \(t\in [0,1]\), the boundary conditions \(Q(0) = Q(1) = \mathbb {I}\) force

$$\begin{aligned} \int _{0}^{1} \varvec{\omega }_Q^s dt \,=\, \textbf{0} . \end{aligned}$$

Moreover, both \(\varvec{\omega }^*\) and any perturbation Q (whether small or large) must obey the invariance of the problem. For example, if the start and end points are reversed, we get \(\varvec{\omega }^* \rightarrow -\varvec{\omega }^*\). And since \(Q(0) = Q(1) = \mathbb {I}\), and since SO(3) is a symmetric space, then unless Q(t) executes a full \(2\pi \) rotation (which cannot help since the total angle traversed from \(R_0\) to \(R_1\) is less than \(\pi \)), symmetry under swapping the roles of the end points indicate that the angular velocity \(\varvec{\omega }_Q^s\) must integrate to zero. That is, there is no way to do better than \(R^*(t)\), and the argument is essentially the same as for the line in the plane.

5.2.3 The case when \(\varvec{\omega }_0 \,\ne \, 0\)

The E–P equation in (63) when \(\mathcal{I} = \mathbb {I}\) and \(\varvec{\omega }_0 \,\ne \, 0\) results in

$$\begin{aligned} \dot{\varvec{\omega }} \,-\, \varvec{\omega }\times \varvec{\omega }_0 \,=\, \textbf{0}. \end{aligned}$$
(65)

This is a linear system of equations with solution

$$\begin{aligned} \varvec{\omega }(t) \,=\, \exp (-t \hat{\varvec{\omega }}_0) \varvec{\omega }(0) \,=\, \left( \mathbb {I}- \sin (\Vert {\varvec{\omega }}_0\Vert t) \frac{\hat{\varvec{\omega }}_0}{\Vert {\varvec{\omega }}_0\Vert } \,+\, (1-\cos (\Vert {\varvec{\omega }}_0\Vert t) \frac{\hat{\varvec{\omega }}_0^2}{\Vert {\varvec{\omega }}_0\Vert ^2}\right) \varvec{\omega }(0) \,. \end{aligned}$$
(66)

Since \(\exp (t \hat{\varvec{\omega }}_0) \, \varvec{\omega }_0 \,=\, \varvec{\omega }_0\) it follows that

$$\begin{aligned} \Vert \varvec{\omega }(t) - \varvec{\omega }_0\Vert ^2 \,=\, \Vert \varvec{\omega }(0) - \varvec{\omega }_0\Vert ^2. \end{aligned}$$

With initial conditions \(R(0) = \mathbb {I}\), the small-time solution for R(t) is then

$$\begin{aligned} R(t<< 1) = \mathbb {I}+ \int _{0}^{t} \hat{\varvec{\omega }}(\tau ) \, d\tau \,=\, \mathbb {I}+ \hat{\varvec{\theta }} \end{aligned}$$

where

$$\begin{aligned} {\varvec{\theta }}(t) \,=\, \int _{0}^{t} {\varvec{\omega }}(\tau ) \, d\tau \,=\, \left( t \mathbb {I}+ [\cos (\Vert {\varvec{\omega }}_0\Vert t) -1]\frac{\hat{\varvec{\omega }}_0}{\Vert {\varvec{\omega }}_0\Vert ^2} \,+\, (t|{\varvec{\omega }}_0\Vert -\sin (\Vert {\varvec{\omega }}_0\Vert t) \frac{\hat{\varvec{\omega }}_0^2}{\Vert {\varvec{\omega }}_0\Vert ^3}\right) \varvec{\omega }(0) . \end{aligned}$$

A small perturbation of this will be \(R(t) \exp (\hat{\varvec{\epsilon }})\) with \(\varvec{\epsilon }(0) = \varvec{\epsilon }(1) = \textbf{0}\), or \(\varvec{\theta }\rightarrow \varvec{\theta }+ \varvec{\epsilon }\), leading to the cost \(\Vert \varvec{\omega }(t) - \varvec{\omega }_0 + \dot{\varvec{\epsilon }}\Vert ^2\). As with the case of the line in the plane, this reduces to \(\Vert \varvec{\omega }(t) - \varvec{\omega }_0\Vert ^2 + \Vert \dot{\varvec{\epsilon }}\Vert ^2\) because the cross term integrates to zero.

5.2.4 Geodesics on the sphere

A convenient way to describe geodesics connecting two non-colinear points \(\textbf{a}, \textbf{b} \in \mathbb {S}^2\) is via the rotation matrix \({R}(\textbf{a},\textbf{b})\), which most directly transforms a unit vector \(\textbf{a}\) into the unit vector \(\textbf{b}\):

$$\begin{aligned} \textbf{b} = {R}(\textbf{a},\textbf{b}) \textbf{a}. \end{aligned}$$

If \(\theta _{ab} \in [0,\pi ]\) denotes the angle of rotation measured counterclockwise from \(\textbf{a}\) to \(\textbf{b}\) around the axis defined by \(\textbf{a} \times \textbf{b}\). Then again using the notation in (52) it can be shown that [22]

$$\begin{aligned} {R}(\textbf{a},\textbf{b}) = \exp \left( \frac{\theta _{ab}}{\sin \theta _{ab}}\, \widehat{\textbf{a} \times \textbf{b}}\right) = \mathbb {I}+ \widehat{\textbf{a} \times \textbf{b}} \,+\, \frac{(1- \textbf{a} \cdot \textbf{b})}{\Vert \textbf{a} \times \textbf{b}\Vert ^2} \left( \widehat{\textbf{a} \times \textbf{b}}\right) ^2. \end{aligned}$$
(67)

Then the minimal length geodesic arc connecting \(\textbf{a}\) and \(\textbf{b}\) is

$$\begin{aligned} \textbf{u}(t) = \exp (t \log {R}(\textbf{a},\textbf{b})) \textbf{a} = \textbf{a} + t \left( \widehat{\textbf{a} \times \textbf{b}}\right) \textbf{a} \,+\, t^2 \frac{(1- \textbf{a} \cdot \textbf{b})}{\Vert \textbf{a} \times \textbf{b}\Vert ^2} \left( \widehat{\textbf{a} \times \textbf{b}}\right) ^2 \textbf{a} \end{aligned}$$

for \(t \in [0,1]\). In the above, the hat notation in (52) is used.

5.3 Globally optimal solutions on direct and semi-direct products

Here the concept of \((\theta ^*,s^*)\) from optimal curve rolling and reparameterization is generalized where each variational subproblem is in a subgroup of a larger group, which then are bootstrapped to obtain solutions on the larger space.

5.3.1 Direct products

Let G and H be a matrix Lie groups with respective Lie algebra basis elements \(\{E_i\}\) and let \(\{\mathcal{E}_j\}\). Given a direct product with elements \((g,h) \in G \times H\) and the direct product group law

$$\begin{aligned} (g_1,h_1)(g_2,h_2) = (g_1 g_2, h_1 h_2) . \end{aligned}$$

Let \(\varvec{\omega }= (g^{-1} \dot{g})^{\vee }\) and \(\textbf{v} = (h^{-1} \dot{h})^{\vee }\) denote the velocities. (The reason for this choice of names will become clear soon.) As an example, the so-called pose change group [46] is the direct product \(SO(3) \times \mathbb {R}^3\). This can be described as a matrix Lie group with elements of the form of a direct sum

$$\begin{aligned} (R,t) \,=\, R \oplus \left( \begin{array}{cc} \mathbb {I}&{}\quad \textbf{t} \\ \textbf{0}^T &{}\quad 1 \end{array}\right) \,\in \, \mathbb {R}^{7\times 7}. \end{aligned}$$

In this context

$$\begin{aligned} h^{-1} \dot{h} \,=\, \left( \begin{array}{cc} \mathbb {I}&{}\quad -\textbf{t} \\ \textbf{0}^T &{}\quad 1 \end{array}\right) \left( \begin{array}{cc} \mathbb {O}&{}\quad \dot{\textbf{t}} \\ \textbf{0}^T &{}\quad 0 \end{array}\right) = \left( \begin{array}{cc} \mathbb {O}&{}\quad \dot{\textbf{t}} \\ \textbf{0}^T &{}\quad 0 \end{array}\right) \end{aligned}$$

and \(\textbf{v} = (h^{-1} \dot{h})^{\vee } = \dot{\textbf{t}}\).

\(SO(3) \times \mathbb {R}^3\) is not the group of handedness preserving Euclidean motions, but it still is relevant in describing changes in pose. For example, if an aerial vehicle is moving, then it can be convenient to track its relative orientation and absolute position as described in [47]. Similarly, for protein-protein docking this direct product approach is often taken [48,49,50].

Moreover, if \((R, \textbf{t}) \in SO(3) \times \mathbb {R}^3\), it is convenient to define body-fixed angular velocity \(\varvec{\omega }= (R^T \dot{R})^{\vee }\) and the space fixed view of translational velocity \(\textbf{v} = \dot{\textbf{t}}\). Indeed, in the classical expression for kinetic energy

$$\begin{aligned} T = \frac{1}{2}\varvec{\omega }^T \mathcal{I} \varvec{\omega }\,+\, \frac{1}{2}m \textbf{v} \cdot \textbf{v} \end{aligned}$$

this is exactly what is done.

It is clear that if the globally minimal solutions to

$$\begin{aligned} J_1 = \int _{t_1}^{t_2} f_1(g,\varvec{\omega },t) dt \,\,\,\textrm{and}\,\,\, J_2 = \int _{t_1}^{t_2} f_2(h,\textbf{v},t) dt \end{aligned}$$

are obtained separately, then the globally minimal solution for \(J_1+J_2\) on the product group will be obtained by pairing the solutions for each independently as \((g^*(t), h^*(t))\). In the case of the direct product \(SO(3) \times \mathbb {R}^3\)

$$\begin{aligned} (R^*(\tau ),\textbf{t}^*(\tau )) = \left( R_0 \circ \exp (\tau \cdot \log (R_0^T R_1))\,,\, \textbf{t}_0 + (\textbf{t}_1 - \textbf{t}_0)\tau \right) \,. \end{aligned}$$
(68)

Slightly more interesting than that is when the cost is of the formFootnote 7

$$\begin{aligned} f(g,h,\varvec{\omega },\textbf{v}) = f_1(g,\varvec{\omega })\,+\,\frac{1}{2} \Vert \textbf{v} - A(g) \varvec{\omega }\Vert _W^2 \end{aligned}$$
(69)

in analogy with the coordinate-dependent discussion of bootstrapping earlier in the paper. Assuming that \(f_1\) satisfies the E–P equation, then the E–P equation for g will reduce to

$$\begin{aligned} \cdot \frac{d}{dt} \textbf{e}^T A^TW(A \varvec{\omega }- \textbf{v}) - \varvec{\omega }\times (A \varvec{\omega }- \textbf{v}) = (A \varvec{\omega }- \textbf{v}) \tilde{E}_i A \varvec{\omega }. \end{aligned}$$

and the E–P equation for h will reduce to

$$\begin{aligned} \frac{d}{dt} (A \varvec{\omega }- \textbf{v}) = \textbf{0}. \end{aligned}$$

The condition for global minimality

$$\begin{aligned} \textbf{v} = A(g) \varvec{\omega }\end{aligned}$$

is obtained when \(\varvec{\omega }(0) = \textbf{0}\).

5.3.2 Semidirect products

Sometimes cost functions such as those in (69) arise naturally in the context of semi-direct products rather than direct. The group of special Euclidean motions

$$\begin{aligned} SE(3) = \mathbb {R}^3 \rtimes SO(3) \end{aligned}$$

is such semi-direct product. Its elements reside in the same underlying space as the pose change group, \(g_i = (R_i, \textbf{t}_i)\), but the group law is different:

$$\begin{aligned} g_1 g_2 = (R_1 R_2, R_1 \textbf{t}_2 + \textbf{t}_1). \end{aligned}$$

The standard way to describe SE(3) as a matrix Lie group is to write its elements as

$$\begin{aligned} g \,=\, \left( \begin{array}{cc} R &{}\quad \textbf{t} \\ \textbf{0}^T &{}\quad 1 \end{array}\right) \,\in \, \mathbb {R}^{4\times 4}. \end{aligned}$$

The corresponding velocities can be written as

$$\begin{aligned} \varvec{\xi }= (g^{-1} \dot{g})^{\vee } \,=\, \left( \begin{array}{c} \varvec{\xi }_i \\ \varvec{\xi }_2 \end{array}\right) \,=\, \left( \begin{array}{c} (R^T \dot{R})^{\vee } \\ R^T \textbf{v} \end{array}\right) \end{aligned}$$

For example, in modeling the minimal energy shapes of DNA molecules treated as elastic filaments, a common model to describe the strain energy at the point s along a backbone curve is of the form

$$\begin{aligned} f(g,\varvec{\xi }) = \frac{1}{2}\Vert \varvec{\xi }- \varvec{\xi }_0\Vert _{K} . \end{aligned}$$

For special cases of K, the theory in (5.1) can be applied. The resulting geodesics will be different than in the direct product case, with the translational parts taking helical paths. In principle, similar expressions can be used when modeling large elastic deformations in sheets and solids comprised of elastic filaments in multiple directions. Then such a formulation may be applied to robots powered by transparent elastic actuators, as in [51].

For special cases of A(g) it is also possible to capture a cost function such as (69) in the form of \(\frac{1}{2}\Vert \varvec{\xi }\Vert _{K}\). In particular, writing

$$\begin{aligned} \Vert \varvec{\xi }\Vert _{K} \,=\, \varvec{\xi }_1^T K_{11} \varvec{\xi }_1 \,+\, \Vert \varvec{\xi }_2 - A_0 \varvec{\xi }_1\Vert _W^2 \end{aligned}$$

we recognize that \(\varvec{\xi }_2 = R^T \textbf{v}\), and when \(W = c\mathbb {I}\), then

$$\begin{aligned} \Vert \varvec{\xi }_1 - A_0 \varvec{\xi }_2\Vert _W^2 = c^2 \Vert \varvec{\xi }_2 - R A_{0}^{-1} \varvec{\xi }_1\Vert ^2. \end{aligned}$$

Then (69) can be written as a homogenous problem on SE(3) rather than as a bootstrapped problem from SO(3) to \(SO(3) \times \mathbb {R}^3\). In other words, we can minimize the bootstrapped problem on \(SO(3) \times \mathbb {R}^3\) as a homogeneous one on SE(3) with \(f(\varvec{\xi }) = \frac{1}{2}\Vert \varvec{\xi }\Vert _{K}\) and where

$$\begin{aligned} K \,=\, \left( \begin{array}{cc} K_{11} &{}\quad c A_0 \\ c A_0^T &{}\quad c\mathbb {I}\end{array}\right) . \end{aligned}$$

5.4 Application: extraction of salient actions from a video sequence

An \(M\times N\) black-and-white image consists of \(M\cdot N\) pixels, each with fixed location and pixel intensity (grayscale) values. When there is color, instead of a scalar, each pixel has a vector of RGB (red, green, blue) values. A black-and-white video can be viewed as a curve in \(\mathbb {R}^{M\times N}\) parameterized by time, and an individual image in the video is a point in this space. Similarly, a color video can be thought of as a curve in \(\mathbb {R}^{M\times N \times 3}\).

Given a video sequence, a problem of importance is activity recognition. For example, if a humanoid robot is charged with taking care of grandma, the robot should decode grandma’s actions and gestures such as waving, walking, and summoning. As such, salient actions need to be extracted from video sequences and matched with stored representatives of such action classes in a database. A first step in comparing two video sequences \(X_1(\tau )\) and \(X_2(\tau )\) for \(\tau \in [0,1]\) is to put the salient actions on a common timescale. Suppose that each video illustrates a human hand waving from left to right and back. One person may wave faster at the beginning and slower at the end, and the other person may wave slower at the beginning and faster at the end. This means that computing a quantity such as \(\int _{0}^{1} \Vert X_1(\tau ) - X_2(\tau )\Vert ^2 d\tau \) to measure the differences between the videos will give a large value, and miss the fact that both videos describe essentially the same waving simply because they are out of sync. A way to reconcile the differences in the trajectories described by the videos is to assign a different timescale for one relative to the other. This is one of the goals of dynamic time warping. The literature on this topic is immense, and some well known articles include [52,53,54].

An alternative to dynamic time warping is to optimally reparameterize both of the videos as \(X_i(\tau _i(t))\). Since \(dX_i/dt = X'_i(\tau ) \dot{\tau }\), a measure of the amount of change in video i at any instant is \(\Vert \dot{X}_i(\tau )\Vert ^2 = \Vert X'_i(\tau )\Vert ^2 \dot{\tau }^2\). An optimal timescale \(\tau _i^*(t)\) for each video can be defined as one that expands time when there is a lot of change in the video (i.e., large value of derivative), and compresses time when nothing is happening (i.e., small value of derivative). In other words, we seek \(X_i(\tau _i^*(t))\) where \(\tau _i^*\) minimizes the functional

$$\begin{aligned} J = \int _{0}^{1} \Vert X'_i(\tau )\Vert ^2 \dot{\tau }^2 dt . \end{aligned}$$

This is analogous to the curve reparametrization problem reviewed earlier in this paper, and normalizes the timescale of both videos according to how much is changing in each. This is the approach taken in [55, 56].

The curvature of such an arclength parameterized curve in pixel space can be described using a higher-dimensional version of the Frenet–Serret apparatus, and the analogs of curvature and torsion can be used to identify motion features as explained in [57].

Moreover, rather than considering only the raw images in the video sequence, it is common to use image processing techniques to detect edges or extract other features that are tracked from frame to frame. These features undergo motions within the image plane, which can be thought of as being orthogonal to the temporal direction. Motions can be intrinsic to the the action being observed by the camera, or they can be a result of nuisances, as would be the case if the camera is shaking when the video is taken, or one person is waving from a stationary location and another is waving while riding a horse. When the goal is to isolate actions, the contribution of these nuisance motions should be eliminated as much as possible before comparing videos. It therefore makes sense to seek the optimal path \(g_i(\tau ) \in G\) (a group of nuisance motions) such that \(Y_i(\tau ) = g_i(\tau ) \cdot X_i(\tau )\) is minimally varying. Minimizing \(\int _{0}^{1} \Vert dY_i/d\tau \Vert ^2 d\tau \) results in an Euler–Poincaré problem on G. After nuisance motions are removed, the resulting \(Y_i^*(\tau ) = g_i^*(\tau ) \cdot X_i(\tau )\) can be optimally reparameterized as described above. The resulting motion-corrected and temporally reparameterized videos \(Y_i^*(\tau _i^*(t))\) can then be compared.

In this way a problem reduces to something analogous to the joint \(\theta \)-s problem in curve framing (with motion in the image plane being like \(\theta (t)\), and compression or expansion of the temporal variable being like s(t)). This joint problem was addressed in [58].

6 Conclusions

A number of variational calculus problems with globally optimal solutions are reviewed and used to construct new globally optimal solutions to variational problems on larger spaces. This “bootstrapping” approach is demonstrated in the context of simultaneous twist adjustment relative to the Frenet frames of a space curve to give the Bishop frames, together with the reparametrization of the framed curve starting with arclength to result in a framing that evolves more gradually. These ideas are then generalized to classes of coordinate-dependent variational problems and coordinate-free Lie-group problems with globally minimal solutions. In many cases of interest one underlying space can be endowed with multiple different group operations (e.g., direct product or semidirect product), and it is shown how different globally minimal trajectories result from bootstrapping from subgroups to these larger product groups.