# A contact covariant approach to optimal control with applications to sub-Riemannian geometry

- First Online:

- Received:
- Accepted:

- 1 Citations
- 692 Downloads

## Abstract

We discuss contact geometry naturally related with optimal control problems (and Pontryagin Maximum Principle). We explore and expand the observations of Ohsawa (Autom J IFAC 55:1–5, 2015), providing simple and elegant characterizations of normal and abnormal sub-Riemannian extremals.

### Keywords

Pontryagin maximum principle Contact geometry Contact vector field Sub-Riemannian geometry Abnormal extremal### Mathematics Subject Classification

49K15 53D10 53C17 58A30## 1 Introduction

*A contact interpretation of the Pontryagin Maximum Principle* In a recent paper, Ohsawa [17] observed that for normal solutions of the optimal control problem on a manifold *Q*, the Hamiltonian evolution of the covector \(\varvec{\varLambda }_t\) in \(\mathrm {T}^*(Q\times \mathbb {R})\) considered in the Pontryagin maximum principle (PMP), projects to a well-defined contact evolution in the projectivization \(\mathbb {P}(\mathrm {T}^*(Q\times \mathbb {R}))\). Here, \(Q\times \mathbb {R}\) is the extended configuration space (consisting of both the configurations *Q* and the costs \(\mathbb {R}\)) and \(\mathbb {P}(\mathrm {T}^*(Q\times \mathbb {R}))\) is equipped with a natural contact structure. Moreover, Ohsawa observed that the maximized Hamiltonian of the PMP is precisely the generating function of this contact evolution.

The above result was our basic inspiration to undertake this study. Our goal was to understand, from a geometric viewpoint, the role and origins of the above-mentioned contact structure in the PMP and to study possible limitations of the contact approach (does it work alike for abnormal solutions, etc.).

As a result we prove Theorem 3, a version of the PMP, in which the standard Hamiltonian evolution of a covector curve \(\varvec{\varLambda }_t\) in \(\mathrm {T}^*(Q\times \mathbb {R})\) along an optimal solution \(\varvec{q}(t)\in Q\times \mathbb {R}\) is substituted by a contact evolution of a curve of hyperplanes \(\varvec{\mathcal {H}}_t\) in \(\mathrm {T}(Q\times \mathbb {R})\) along this solution. (Note that the space of all hyperplanes in \(\mathrm {T}(Q\times \mathbb {R})\) is actually the manifold of contact elements of \(Q\times \mathbb {R}\) and can be naturally identified with \(\mathbb {P}(\mathrm {T}^*(Q\times \mathbb {R}))\).) It is worth mentioning that this result is valid regardless of the fact whether the solution is normal or abnormal and, moreover, the contact evolution is given by a natural contact lift of the extremal vector field (regarded as a time-dependent vector field on \(Q\times \mathbb {R}\)). Finally, using the well-known relation between contact vector fields and smooth functions we were able to interpret the Pontryagin maximized Hamiltonian as a generating function of the contact evolution of \(\varvec{\mathcal {H}}_t\).

It seems to us that, apart from the very recent paper of Ohsawa [17], the relation between optimal control and contact geometry has not been explored in the literature. This fact is not difficult to explain as the PMP in its Hamiltonian formulation has been very successful and as symplectic geometry is much better developed and understood than contact geometry. In our opinion, the contact approach to the PMP seems to be a promising direction of studies for at least two reasons. First of all it allows for a unified treatment of normal and abnormal solutions and, second, it seems to be closer to the actual geometric meaning of the PMP (we shall justify this statement below).

*About the proof* The justification of Theorem 3 is rather trivial. In fact, it is just a matter of interpretation of the classical proof of the PMP [18] (see also [13, 15]). Recall that geometrically the PMP says that at each point of the optimal trajectory \(\varvec{q}(t)\), the cone \(\varvec{\mathcal {K}}_t\subset \mathrm {T}_{\varvec{q}(t)}(Q\times \mathbb {R})\) approximating the reachable set can be separated, by a hyperplane \(\varvec{\mathcal {H}}_t\subset \mathrm {T}_{\varvec{q}(t)}(Q\times \mathbb {R})\), from the direction of the decreasing cost (cf. Fig. 2). Thus, in its original sense the PMP describes the evolution of a family of hyperplanes \(\varvec{\mathcal {H}}_t\) (i.e., a curve in the manifold of contact elements of \(Q\times \mathbb {R}\), identified with \(\mathbb {P}(\mathrm {T}^*(Q\times \mathbb {R}))\)) along the optimal solution. This evolution is induced by the flow of the optimal control on \(Q\times \mathbb {R}\). From this perspective, the only ingredient one needs to prove Theorem 3 is to show that this flow induces a contact evolution (with respect to the natural contact structure) on \(\mathbb {P}(\mathrm {T}^*(Q\times \mathbb {R}))\). It is worth mentioning that the covector curve \(\varvec{\varLambda }_t\in \mathrm {T}^*(Q\times \mathbb {R})\) from the standard formulation of the PMP is nothing else than just an alternative description of the above-mentioned curve of hyperplanes, i.e., \(\varvec{\mathcal {H}}_t=\ker \varvec{\varLambda }_t\) for each time *t*. Obviously, there is an ambiguity in choosing such a \(\varvec{\varLambda }_t\), which is defined up to a rescaling.

*Applications* From the above perspective, it is obvious that the description of the necessary conditions for optimality of the PMP in terms of \(\varvec{\mathcal {H}}_t\)s (the contact approach) is closer to the actual geometric meaning of the PMP as it contains the direct information about the separating hyperplanes. On the contrary, in the Hamiltonian approach this information is translated into the language of covectors (not to forget the non-uniqueness of the choice of \(\varvec{\varLambda }_t\)).

*Q*is an optimal control problem in which the controls parametrize trajectories tangent to a smooth distribution \(\mathcal {D}\subset \mathrm {T}Q\) and the cost of a trajectory is its length calculated via a given positively defined bi-linear form \(g:\mathcal {D}\times \mathcal {D}\rightarrow \mathbb {R}\) (the SR metric). Actually, due to the Cauchy–Schwartz inequality, the trajectories minimizing the length are exactly those that minimize the kinetic energy and are parametrized by the arc-length. In such a setting, using some elementary geometric considerations, we were able to relate \(\mathcal {D}\) and

*g*with the separating hyperplanes \(\varvec{\mathcal {H}}_t\) (Lemma 11). In consequence, still using elementary arguments, the following two results about SR extremals were derived:

- Theorem 5 completely characterizes abnormal SR extremals. It states that an absolutely continuous curve \(q(t)\in Q\) tangent to \(\mathcal {D}\) is an abnormal extremal if and only if the minimal distribution along
*q*(*t*) which contains \(\mathcal {D}_{q(t)}\) and is invariant along*q*(*t*) under the flow of the extremal vector field is of rank smaller than \(\dim Q\). As a special case (for smooth vector fields) we obtain, in Corollary 1, the following result: if the distribution spanned by the iterated Lie brackets of a given \(\mathcal {D}\)-valued vector field \(X\in \varGamma (\mathcal {D})\) with all possible \(\mathcal {D}\)-valued vector fields, i.e.,is of constant rank smaller than \(\dim Q\), then the integral curves of$$\begin{aligned} \big \langle {{\text {ad}}_X^k(Z)\ |\ Z\in \varGamma (\mathcal {D}),\; k=0,1,2,\ldots }\big \rangle \end{aligned}$$*X*are abnormal SR extremals. - Theorem 6 in a similar manner (yet under an additional assumptions that the controls are normalized with respect to the SR metric
*g*) provides a complete characterization of normal SR extremals. It states that an absolutely continuous curve \(q(t)\in Q\), tangent to \(\mathcal {D}\), is a normal extremal if and only if it is of class \(C^1\) with an absolutely continuous derivative and if the minimal distribution along*q*(*t*) which contains these elements of \(\mathcal {D}_{q(t)}\) that are*g*-orthogonal to \(\dot{q}(t)\) and is invariant along*q*(*t*) under the flow of the extremal vector field does not contain the direction tangent to*q*(*t*) at any point. Again in the smooth case we conclude, in Corollary 2, that if for a given normalized vector field \(X\in \varGamma (\mathcal {D})\) the distribution spanned by the iterated Lie brackets of*X*with all possible \(\mathcal {D}\)-valued vector fields*g*-orthogonal to*X*, i.e.,is of constant rank and does not contain$$\begin{aligned} \big \langle {{\text {ad}}_X^k(Z)\ |\ Z\in \varGamma (\mathcal {D}),\; g(Z,X)=0,\; k=0,1,2,\ldots }\big \rangle \end{aligned}$$*X*at any point of*q*(*t*), then the integral curves of*X*are normal SR extremals.

It should be stressed that the language of flows used throughout is much more effective, and in fact simpler, than the language of Lie brackets usually applied in the study of SR extremals. Indeed, the assertions of Theorems 5 and 6 are valid for non-smooth, i.e., absolutely continuous curves and bounded measurable controls do not require any regularity assumptions (contrary to the characterization in terms of Lie brackets) and work for single trajectories (not necessary families of trajectories).

As an illustration of the above results we give a few examples. In particular, in Examples 1 and 8 we were able to provide a surprisingly easy derivation of the Riemannian geodesic equation (obtaining the equation \(\nabla _{\dot{\gamma }}\dot{\gamma }=0\) from the standard Hamiltonian approach is explained in [1, 20]). In Examples 3, 7, and 9, we re-discover some results of [16, 22] concerning rank-2 distributions.

*Organization of the paper* We begin our considerations by a technical introduction in Sect. 2. Our main goal in this part is to introduce, in a rigorous way, natural differential geometric tools (Lie brackets, flows of time-dependent vector fields, distributions, etc.) in the non-smooth and time-dependent setting suitable for control theory (in general, we consider controls which are only locally bounded and measurable). Most of the results presented in this section are natural generalizations of the results well known in the smooth case. They are essentially based on the local existence and uniqueness of solutions of ODE in the sense of Caratheodory (Theorem 7). To avoid being too technical, we moved various parts of the exposition of this section (including some proofs and definitions) to the Appendix.

In Sect. 3, we briefly recall basic definitions and constructions of contact geometry. In particular, we show an elegant construction of contact vector fields (infinitesimal symmetries of contact distributions) in terms of equivalence classes of vector fields modulo the contact distribution. This construction is more fundamental than the standard one in terms of generating functions (which requires a particular choice of a contact form). It seems to us that so far it has not been presented in the literature.

In Sect. 4, we discuss in detail a natural contact structure on the projectivization of the cotangent bundle \(\mathbb {P}(\mathrm {T}^*M)\). In particular, we construct a natural contact transformation \(\mathbb {P}(F)\) of \(\mathbb {P}(\mathrm {T}^*M)\) induced by a diffeomorphism *F* of *M*. Later we study an infinitesimal counterpart of this construction, i.e., a natural lift of a vector field *X* on *M* to a contact vector field \(\mathbf {C}_{X}\) on \(\mathbb {P}(\mathrm {T}^*M)\).

In Sect. 5, we introduce the optimal control problem for a control system on a manifold *Q* and formulate the PMP in its standard version (Theorem 2). Later we sketch the standard proof of the PMP introducing the cones \(\varvec{\mathcal {K}}_t\) and the separating hyperplanes \(\varvec{\mathcal {H}}_t\). A proper interpretation of these objects, together with our previous considerations about the geometry of \(\mathbb {P}(\mathrm {T}^*M)\) from Sect. 4, allows us to conclude Theorems 3 and 4 which are the contact and the covariant versions of the PMP, respectively.

Finally, in the last Sect. 6, we concentrate our attention on the geometry of the cones \(\varvec{\mathcal {K}}_t\) and hyperplanes \(\varvec{\mathcal {H}}_t\) for the Riemannian and sub-Riemannian geodesic problems. The main results of that section, which characterize normal and abnormal SR extremals, were already discussed in detail in the paragraph “Applications” above.

## 2 Technical preliminaries

As indicated in the Introduction, in this paper we shall apply the language of differential geometry to optimal control theory. This requires some attention as differential geometry uses tools such as vector fields, their flows, distributions and Lie brackets which are a priori smooth, while in control theory it is natural to work with objects of lower regularity. The main technical difficulty is a rigorous introduction of the notion of the flow of a time-dependent vector field (TDVF) with the time-dependence being, in general, only measurable. A solution of this problem, provided within the framework of chronological calculus, can be found in [2]. The recent monograph [11] with a detailed discussion of regularity aspects is another exhaustive source of information about this topic.

Despite the existence of the above-mentioned excellent references, we decided to present our own explication of the notion of the flow of a TDVF. The reasons for that decision are threefold. First of all, this makes our paper self-contained. Second, we actually do not need the full machinery of [2] or [11], so we can present a simplified approach. Finally, for future purposes we need to concentrate our attention on some specific aspects (such as the transport of a distribution along an integral curve of a TDVF and the relation of this transport with the Lie bracket) which are present in neither [2] nor [11]. Our goal in this section is to give a minimal yet sufficient introduction to the above-mentioned concepts. We move technical details and rigorous proofs to the Appendix.

*Time-dependent vector fields and their flows*Let

*M*be a smooth manifold. By a time-dependent vector field on

*M*(denoted TDVF) we shall understand a family of vector fields \(X_t\in \mathfrak {X}(M)\) parametrized by a real parameter

*t*(the

*time*). Every such a field defines the following non-autonomous ODE

^{1}on

*M*

*Caratheodory*in the sense of Definition 11 below guarantees that solutions of (2.1) (in the sense of Caratheodory) locally exist, are unique and are absolutely continuous with bounded derivatives (ACB, see the Appendix) with respect to the time

*t*. For this reason from now on we shall restrict our attention to TDVFs \(X_t\) satisfying the above assumption. We will call them

*Caratheodory TDVF*s. In a very similar context, the notion of a Caratheodory section was introduced in the recent monograph [11]. Actually, in the language of the latter work our notion of a Caratheodory TDVF would be called a locally bounded Caratheodory vector field of class \(C^1\).

A solution of (2.1) with the initial condition \(x(t_0)=x_0\) will be denoted by \(x(t;t_0,x_0)\) and called an *integral curve* of \(X_t\). When speaking about families of such solutions with different initial conditions it will be convenient to introduce (local) maps \(A_{tt_0}:M\rightarrow M\) defined by \(A_{tt_0}(x_0):=x(t;t_0,x_0)\).

### Lemma 1

*M*. Then

For

*t*close enough to \(t_0\) the maps \(A_{tt_0}:M\rightarrow M\) are well-defined local diffeomorphisms.- Moreover, they satisfy the following propertieswhenever both sides are defined.$$\begin{aligned} A_{t_0t_0}={\text {id}}_M\quad \text { and }\quad A_{t\tau }(A_{\tau t_0})=A_{t t_0}\ , \end{aligned}$$(2.2)

Since \(X_t\) is Caratheodory, it satisfies locally the assumptions of Theorem 7. Now the justification of Lemma 1 follows directly from the latter result. Properties (2.2) are merely a consequence of the fact that \(t\mapsto A_{tt_0}(x_0)\) is an integral curve of \(X_t\).

### Definition 1

The family of local diffeomorphisms \(A_{t\tau }:M\rightarrow M\) described in the above lemma will be called the *time-dependent flow of*\(X_t\) (TD flow).

Clearly \(A_{tt_0}\) is a natural time-dependent analog of the notion of the flow of a vector field. This justifies the name “TD flow”. It is worth noticing that, alike for the standard notion of the flow, there is a natural correspondence between TD flows and Caratheodory TDVFs.

### Lemma 2

Let \(A_{t\tau }:M\rightarrow M\) be a family of local diffeomorphisms satisfying (2.2) and such that for each choice of \(x_0\in M\) and \(t_0\in \mathbb {R}\) the map \(t\mapsto A_{tt_0}(x_0)\) is ACB. Then \(A_{t\tau }\) is a TD flow of some Caratheodory TDVF \(X_t\).

The natural candidate for such a TDVF is simply \(X_t(x):=\frac{\partial }{\partial \tau }\big |_{\tau =t}A_{\tau t}(x)\). The remaining details are left to the reader.

*Distributions along integral curves of TDVFs* In this paragraph we shall introduce basic definitions and basic properties related with distributions defined along a single ACB integral curve \(x(t)=x(t;t_0,x_0)\) (with \(t\in [t_0,t_1]\)) of a Caratheodory TDVF \(X_t\). In particular, for future purposes it will be crucial to understand the behavior of such distributions under the TD flow \(A_{t\tau }\) of \(X_t\).

### Definition 2

Let \(x(t)=x(t;t_0,x_0)\) with \(t\in [t_0,t_1]\) be an integral curve of a Caratheodory TDVF \(X_t\). A *distribution*\(\mathcal {B}\)*along**x*(*t*) is a family of linear subspaces \(\mathcal {B}_{x(t)}\subset \mathrm {T}_{x(t)} M\) attached at each point of the considered curve. In general, the dimension of \(\mathcal {B}_{x(t)}\) may vary from point to point.

By an *ACB section of*\(\mathcal {B}\) we will understand a vector field *Z* along *x*(*t*) such that \(Z(x(t))\in \mathcal {B}_{x(t)}\) for every \(t\in [t_0,t_1]\) and that the map \(t\mapsto Z(x(t))\) is ACB. The space of such sections will be denoted by \(\varGamma _{ACB}(\mathcal {B})\). A distribution \(\mathcal {B}\) along *x*(*t*) shall be called *charming* if pointwise it is spanned by a finite set of elements of \(\varGamma _{ACB}(\mathcal {B})\).

*invariant*(or

*respected by a TD flow*\(A_{t\tau }\))

*along*

*x*(

*t*) if

*x*(

*t*) then it is of constant rank along

*x*(

*t*). This follows from the fact that each map \(A_{t\tau }\) is a local diffeomorphism.

Let us remark that the idea behind the notion of a charming distribution is to provide a natural substitution of the notion of smoothness in the situation where a distribution is considered along a non-smooth curve. Observe namely that a restriction of a smooth vector field on *M* to an ACB curve \(x(t;t_0,x_0)\) is a priori only an ACB vector field along \(x(t;t_0,x_0)\).

### Proposition 1

A restriction of a locally finitely generated smooth distribution on

*M*to an ACB curve \(x(t)=x(t;t_0,x_0)\) is charming.Let \(A_{t\tau }\) be the TD flow of a Caratheodory TDVF \(X_t\) and let \(\mathcal {B}\) be a distribution along an integral curve \(x(t)=x(t;t_0,x_0)\) of \(X_t\). Then if \(\mathcal {B}\) is \(A_{t\tau }\)-invariant along

*x*(*t*), it is also charming.

The justification of the above result is straightforward. Regarding the first situation it was already observed that a restriction of a smooth vector field to an ACB curve is an ACB vector field. In the second situation, the distribution \(\mathcal {B}\) is spanned by vector fields \(\mathrm {T}A_{tt_0}(X^i)\) with \(i=1,\ldots ,k\), where \(\{X^1,\ldots ,X^k\}\) is any basis of \(\mathcal {B}_{x_0}\). By the results of Lemma 12 these fields are ACB.

Given a distribution \(\mathcal {B}\) along *x*(*t*) we can always extend it to the smallest (with respect to inclusion) distribution along *x*(*t*) containing \(\mathcal {B}\) and respected by the TD flow \(A_{t\tau }\) along *x*(*t*). This construction will play a crucial role in geometric characterization of normal and abnormal SR extremals in Sect. 6.

### Proposition 2

*x*(

*t*). Then

*x*(

*t*) which contains \(\mathcal {B}\) and is respected by the TD flow \(A_{t\tau }\) along

*x*(

*t*).

Obviously, any distribution \(A_{t\tau }\)-invariant along *x*(*t*) and containing \(\mathcal {B}_{x(t)}\) must contain \(A_{\bullet }(\mathcal {B})_{x(t)}\). The fact that the latter is indeed \(A_{t\tau }\)-invariant along *x*(*t*) follows easily from property (2.2).

*Lie brackets and distributions* Constructing distributions \(A_{t\tau }\)-invariant along *x*(*t*) introduced in Proposition 2, although conceptually very simple, is not very useful from the practical point of view, as it requires calculating the TD flow \(A_{t\tau }\). This difficulty can be overcome by passing to an infinitesimal description in terms of the Lie brackets, however, for a price of loosing some generality. In this paragraph, we shall discuss this and some related problems in detail.

### Definition 3

*the Lie bracket of*\(X_t\)

*and*

*Z*

*along*

*x*(

*t*) by the formula

*t*and then evaluate it at the point

*x*(

*t*), thus obtaining a well-defined field of vectors along

*x*(

*t*) (the regularity of the map \(t\mapsto [X_t,Z]_{x(t)}\) is a separate issue that we shall discuss later).

For future purposes, we would like to extend Definition 3 to be able to calculate the bracket \([X_t,Z]_{x(t)}\) also for fields *Z* of lower regularity. That can be done, but at a price that the bracket \([X_t,Z]_{x(t)}\) would be defined only for almost every (a.e.) \(t\in [t_0,t_1]\). The details of this construction are provided below.

*X*emerging from \(x_0\) at time 0 (in particular, \(\frac{\partial }{\partial t}\big |_{0}x(t)=X(x_0)\)) and \(s\mapsto z(s)\) is the integral curve of

*Z*emerging from \(x_0\) at time 0 (in particular, \(\frac{\partial }{\partial s}\big |_0z(s)=Z(x_0)\)). The above formula, actually, allows to define \([X,Z]_{x_0}\) on any smooth manifold

*M*, simply by taking it as the definition of the Lie bracket \([X,Z]_{x_0}\) in a particular local coordinate system on

*M*. It is an easy exercise to show that \([X,Z]_{x_0}\) defined in such a way is a true geometric object (i.e., it does not depend on the particular choice of a local chart). Note that to calculate \([X,Z]_{x_0}\) we need only to know

*X*along \(s\mapsto z(s)\) and

*Z*along \(t\mapsto x(t)\).

*x*(

*t*) at time \(\tau =t\) and \(s\mapsto z(s,t)=z(s;0,x(t))\) is the integral curve of

*Z*emerging from

*x*(

*t*) at \(s=0\), i.e., \(z(t,0)=x(t)=x(t;t_0,x_0)\) and \(\frac{\partial }{\partial s}\big |_0z(t,s)=Z(x(t))\). Observe now that by definition \(\dot{x}_t(t)=X_t(x(t))=\dot{x}(t)\) and thus (2.3) holds for \(x_t(\tau )=x(\tau )\). What is more, (2.3) is well defined at a given time \(t\in [t_0,t_1]\) also for any vector field

*Z*on

*M*(not necessarily smooth) such that the map \(\tau \mapsto Z(x(\tau ))\) is differentiable at \(\tau =t\). This observation justifies the following statement.

### Proposition 3

Assuming that \(t\mapsto Z(x(t))\) is an ACB map and that \(X_t\) is a Caratheodory TDVF, the Lie bracket \([X_t,Z]_{x(t)}\) is defined by formula (2.3) almost everywhere along *x*(*t*). In fact, it is well defined at all regular points of \(t\mapsto Z(x(t))\). Moreover, \(t\mapsto [X_t,Z]_{x(t)}\) is a measurable and locally bounded map.

The Lie bracket \([X_t,Z]_{x(t)}\) is completely determined by the values of *Z* along *x*(*t*) and by the values of \(X_t\) in a neighborhood of *x*(*t*).

In other words, formula (2.3) is an extension of Definition 3 which allows to calculate the Lie bracket \([X_t,Z]_{x(t)}\) at almost every point of a given integral curve *x*(*t*) of \(X_t\), for vector fields *Z* defined only along *x*(*t*) and such that \(t\mapsto Z(x(t))\) is ACB. The latter generalization is necessary in control theory, since, as \(t\mapsto x(t)\) is in general ACB only, even if *Z* is a smooth vector field, we cannot expect the map \(t\mapsto Z(x(t))\) to be of regularity higher than ACB.

The above construction of the Lie bracket \([X_t,Z]_{x(t)}\) allows to introduce the following natural construction.

### Definition 4

*x*(

*t*). By \([X_t,\mathcal {B}]\) we shall understand the distribution along

*x*(

*t*) generated by the Lie brackets of \(X_t\) and all ACB sections of \(\mathcal {B}\):

*x*(

*t*) will be called \(X_t\)-

*invariant along*

*x*(

*t*) if

Note that neither \([X_t,\mathcal {B}]\) nor \(\mathcal {B}+[X_t,\mathcal {B}]\) need be charming distributions along *x*(*t*) even if so was \(\mathcal {B}\) as, in general, there is no guarantee that these distributions will be spanned by ACB sections (we can loose regularity when calculating the Lie bracket).

The following result explains the relation between the \(A_{t\tau }\)- and \(X_t\)-invariance of distributions along *x*(*t*).

### Theorem 1

- (a)
\(\mathcal {B}\) is respected by the TD flow \(A_{t\tau }\) of \(X_t\) along

*x*(*t*). - (b)
\(\mathcal {B}\) is a charming distribution \(X_t\)-invariant and of constant rank along

*x*(*t*).

The proof is given in the Appendix. Note that the equivalence between \(X_t\)- and \(A_{t\tau }\)-invariance is valid only if the considered distribution \(\mathcal {B}\) along *x*(*t*) satisfies regularity conditions: it has to be charming and of constant rank along *x*(*t*).

Given a charming distribution \(\mathcal {B}\) along *x*(*t*), it is clear in the light of the above result, that \(A_{\bullet }(\mathcal {B})_{x(t)}\), the smallest distribution \(A_{t\tau }\)-invariant along *x*(*t*) and containing \(\mathcal {B}\), should be closed under the operation \([X_t,\cdot ]\). Thus, in the smooth case, it is natural to try to construct \(A_{\bullet }(\mathcal {B})\) in the following way.

### Lemma 3

*X*be a \(C^\infty \)-smooth vector field and let \(\mathcal {B}\) a \(C^\infty \)-smooth distribution on

*M*. Assume that along an integral curve \(x(t)=x(t;t_0,x_0)\) of

*X*(with \(t\in [t_0,t_1]\)), the distribution spanned by the iterated Lie brackets of

*X*with all possible \(\mathcal {B}\)-valued vector fields, i.e.,

*x*(

*t*). Then \({\text {ad}}^\infty _X(\mathcal {B})_{x(t)}\) is the smallest distribution along

*x*(

*t*) containing \(\mathcal {B}_{x(t)}\) and respected by \(A_{t}\), the flow of

*X*, i.e., \({\text {ad}}^\infty _X(\mathcal {B})_{x(t)}=A_{\bullet }(\mathcal {B})_{x(t)}\).

### Proof

The justification of the above result is quite simple. By construction, \({\text {ad}}^\infty _X(\mathcal {B})_{x(t)}\) is the smallest distribution along *x*(*t*) containing \(\mathcal {B}_{x(t)}\) and closed under the operation \({\text {ad}}_X=[X,\cdot ]\). It is clear that \({\text {ad}}^\infty _X(\mathcal {B})\) is spanned by a finite number of smooth vector fields of the form \({\text {ad}}_X^k(Z)\), where \(Z\in \varGamma (\mathcal {B})\), and thus it is charming. Since it is also of constant rank along *x*(*t*) we can use Theorem 1 (for a time-independent vector field *X*) to prove that \({\text {ad}}^\infty _X(\mathcal {B})_{x(t)}\) is invariant along *x*(*t*) under the flow \(A_{t}\). We conclude that \(A_{\bullet }(\mathcal {B})_{x(t)}\subset {\text {ad}}^\infty _X(\mathcal {B})_{x(t)}\). On the other hand, since \(A_{\bullet }(\mathcal {B})_{x(t)}\) is \(A_{t}\)-invariant along *x*(*t*), again by Theorem 1, it must be closed with respect to the operation \([X,\cdot ]\). In particular, it must contain the smallest distribution along *x*(*t*) containing \(\mathcal {B}_{x(t)}\) and closed under the operation \([X,\cdot ]\). Thus, \(A_{\bullet }(\mathcal {B})_{x(t)}\supset {\text {ad}}^\infty _X(\mathcal {B})_{x(t)}\). This ends the proof. \(\square \)

### Remark 1

Let us remark that the construction provided by (2.4) would be, in general, not possible in all non-smooth cases. The basic reason is that the Lie bracket defined by (2.3) is of regularity lower than the initial vector fields, i.e., \([X_t,Z]\) may not be ACB along *x*(*t*) even if so were \(X_t\) and *Z*. Thus, by adding the iterated Lie brackets to the initial distribution \(\mathcal {B}\), we may loose the property that it is charming (cf. also a remark following Definition 4) which is essential for Theorem 1 to hold.

Also the constant rank condition is important, as otherwise the correspondence between \(X_t\)- and \(A_{t\tau }\)-invariance provided by Theorem 1 does not hold. If (2.4) is not of constant rank along *x*(*t*) we may only say that \({\text {ad}}^\infty _X(\mathcal {B})_{x(t)}\subset A_{\bullet }(\mathcal {B})_{x(t)}\) (see also Remark 10).

It is worth noticing that this situation resembles the well-known results of Sussmann [19] concerning the integrability of distributions: being closed under the Lie bracket is not sufficient for integrability, as the invariance with respect to the flows of distribution-valued vector fields is also needed. After adding an extra assumption that the rank of the distribution is constant, the latter condition can be relaxed.

By the results of Proposition 3, the property that a distribution \(\mathcal {B}\) is \(X_t\)-invariant along *x*(*t*) depends not only on \(\mathcal {B}\) and the values of a Caratheodory TDVF \(X_t\) along *x*(*t*), but also on the values of \(X_t\) in a neighborhood of that integral curve. It turns out, however, that in a class of natural situations the knowledge of \(X_t\) along *x*(*t*) suffices for checking the \(X_t\)-invariance.

### Lemma 4

Let \(\mathcal {D}\) be a smooth distribution of constant rank on \(M, X_t\) a Caratheodory \(\mathcal {D}\)-valued TDVF and \(x(t)=x(t;t_0,x_0)\) (with \(t\in [t_0,t_1]\)) an integral curve of \(X_t\). Let \(\mathcal {B}\) be a charming distribution along *x*(*t*), such that \(\mathcal {D}_{x(t)}\subset \mathcal {B}_{x(t)}\) for every *t*. Then the property of \(\mathcal {B}\) being \(X_t\)-invariant along *x*(*t*) depends only on the values of \(X_t\) along *x*(*t*).

The proof is given in the Appendix.

## 3 The basics of contact geometry

*Contact manifolds and contact transformations* In this section, we shall recall basic facts from contact geometry. A contact structure on a manifold \(\mathcal {M}\) is a smooth co-rank one distribution \(\mathcal {C}\subset \mathrm {T}\mathcal {M}\) satisfying a certain maximum non-degeneracy condition. To formalize that condition we introduce the following geometric construction. From now on we shall assume that the pair \((\mathcal {M},\mathcal {C})\) consists of a smooth manifold \(\mathcal {M}\) and a smooth co-rank one distribution \(\mathcal {C}\) on \(\mathcal {M}\). Sometimes it will be convenient to treat \(\mathcal {C}\) as a vector subbundle of \(\mathrm {T}\mathcal {M}\).

*bundle normal to*\(\mathcal {C}\) in \(\mathrm {T}\mathcal {M}\) as the quotient

*X*and

*Y*be two \(\mathcal {C}\)-valued vector fields on \(\mathcal {M}\). It is easy to check that the class of their Lie bracket [

*X*,

*Y*] in \(\mathrm {N}\mathcal {C}\) is tensorial with respect to both

*X*and

*Y*. That is, for any pair of smooth functions \(\phi ,\psi \in C^\infty (\mathcal {M})\)

### Definition 5

A pair \((\mathcal {M},\mathcal {C})\) consisting of a smooth manifold \(\mathcal {M}\) and a smooth co-rank one distribution \(\mathcal {C}\subset \mathrm {T}\mathcal {M}\) is called a *contact manifold* if the associated \(\mathrm {N}\mathcal {C}\)-valued 2-form \(\beta \) is non-degenerate, i.e., if \(\beta (X,\cdot )\equiv 0\) implies \(X\equiv 0\).

Sometimes we call \(\mathcal {C}\) a *contact structure* or a *contact distribution* on \(\mathcal {M}\).

Observe that \(\mathcal {C}\) is necessarily of even rank (\(\mathcal {M}\) is odd-dimensional). This follows from a simple fact from linear algebra that every skew-symmetric 2-form on an odd-dimensional space has a non-trivial kernel.

### Definition 6

*F*, is called a

*contact transformation*. By a contact vector field (CVF) on \(\mathcal {M}\) (or an

*infinitesimal symmetry*of \((\mathcal {M},\mathcal {C})\)) we shall understand a smooth vector field \(X\in \mathfrak {X}(\mathcal {M})\) preserving the contact distribution \(\mathcal {C}\), i.e.,

*X*is a CVF if and only if its (local) flow \(A_{t}\) consists of contact transformations (cf. Theorem 1).

It is worth mentioning that the above relation between contact vector fields and flows consisting of contact transformations can be generalized to the context of TDVFs and TD flows (cf. Sect. 2). We will need this generalized relation in Sect. 5 after introducing control systems.

### Proposition 4

Let \(X_t\) be a Caratheodory TDVF on a contact manifold \((\mathcal {M},\mathcal {C})\) and let \(A_{t\tau }\) be the TD flow of \(X_t\). Then \(X_t\) is a contact vector field for every \(t\in \mathbb {R}\) (i.e., \([X_t,\mathcal {C}]\subset \mathcal {C}\)) if and only if the TD flow \(A_{t\tau }\) consists of contact transformations.

The proof follows directly from Theorem 1 by taking \(\mathcal {B}=\mathcal {C}\) (which is charming, see—cf. Proposition 1).

*Characterization of CVFs* It turns out that there is a one-to-one correspondence between CVFs on \(\mathcal {M}\) and sections of the normal bundle \(\mathrm {N}\mathcal {C}\).

### Lemma 5

### Remark 2

Throughout we will denote by \(X\in \mathfrak {X}(\mathcal {M})\) vector fields on \(\mathcal {M}\), by \(Y\in \varGamma (\mathcal {C})\) vector fields valued in \(\mathcal {C}\) and by *C* (also with variants, like \(C_{\phi }, C_{[X]}\) or \(\mathbf {C}_{\phi }\)) contact vector fields.

### Proof

*Y*is a \(\mathcal {C}\)-valued vector field. The correctness of this definition follows from the fact that for every \(\mathcal {C}\)-valued vector field

*Y*and for any function \(\phi \in C^\infty (\mathcal {M})\) we have

*X*. Indeed, we can interpret

*h*(

*X*) as the unique (note that \(\beta \) is non-degenerate) solution of the equation \(\alpha _X(\cdot )=\beta (h(X),\cdot )\). Now observe that if

*X*and \(X'\) are two different representatives of [

*X*], then \(Y:=X'-X\) is a \(\mathcal {C}\)-valued vector field on \(\mathcal {M}\). Thus, we have \(\alpha _Y(\cdot )=-\beta (Y,\cdot )\) and hence, using the obvious linearity of \(\alpha _X\) with respect to

*X*, we get

*Y*we have

Finally, we need to check that every CVF is of the form \(C_{[X]}\). By construction the class of \(C_{[X]}\) in \(\mathrm {N}\mathcal {C}\) is equal to the class of *X* in \(\mathrm {N}\mathcal {C}\) (these two vector fields differ by a \(\mathcal {C}\)-valued vector field *h*(*X*)). Thus, the classes of CVFs of the form \(C_{[X]}\) realize every possible section of \(\mathrm {N}\mathcal {C}\). Now it is enough to observe that the \(\mathrm {N}\mathcal {C}\)-class uniquely determines a CVF. Indeed, if *C* and \(C'\) are two CVFs belonging to the same class in \(\mathrm {N}\mathcal {C}\), then their difference \(X-X'\) is a \(\mathcal {C}\)-valued CVF, i.e., \([C-C',Y]\equiv 0\mod \mathcal {C}\) for any \(\mathcal {C}\)-valued vector field *Y*. That is, \(\beta (C-C',\cdot )\equiv 0\) and from the non-degeneracy of \(\beta \) we conclude that \(C-C'\equiv 0\). This ends the proof. \(\square \)

### Remark 3

It is natural to call a vector field \(X\in \mathfrak {X}(\mathcal {M})\) (or its \(\mathrm {N}\mathcal {C}\)-class [*X*]) a *generator* of the CVF \(C_{[X]}\). Observe that the \(\mathrm {N}\mathcal {C}\)-class of the CVF \(C_{[X]}\) is the same as the class of its generator *X* (they differ by a \(\mathcal {C}\)-valued vector field *h*(*X*)).

In the literature, see e.g., [14], a contact distribution \(\mathcal {C}\) on \(\mathcal {M}\) is often presented as the kernel of a certain 1-form \(\omega \in \varLambda ^1(\mathcal {M})\) (such an \(\omega \) is then called a *contact form*). In the language of \(\omega \), the maximum non-degeneracy condition can be expressed as the non-degeneracy of the 2-form \({\text {d}}\omega \) on \(\mathcal {C}\). The latter is equivalent to the condition that \(\omega \wedge ({\text {d}}\omega )^{\wedge n}\), where \(n=\frac{1}{2}{\text {rank}}\mathcal {C}\), is a volume form on \(\mathcal {M}\) (i.e., \(\omega \wedge ({\text {d}}\omega )^{\wedge n}\ne 0\)).

Also CVFs have an elegant characterization in terms of contact forms. One can show that CVFs are in one-to-one correspondence with smooth functions on \(\mathcal {M}\). Choose a contact form \(\omega \) such that \(\mathcal {C}=\ker \omega \), then this correspondence is given by an assignment \(\phi \mapsto C_{\phi }\), where \(C_{\phi }\) is the unique vector field on *M* such that \(\omega (C_{\phi })=\phi \) and \((C_{\phi }\lrcorner {\text {d}}\omega )|_{\mathcal {C}}=-{\text {d}}\phi |_{\mathcal {C}}\). A function \(\phi \) is usually called the *generating function* of the corresponding CVF \(C_{\phi }\) associated with the contact form \(\omega \). Notice that given a contact vector field \(C_{}=C_{\phi }\) and a contact form \(\omega \) one can recover the generating function simply by evaluating \(\omega \) on \(C_{}\), i.e., \(\phi =\omega (C)\).

It is interesting to relate the construction \(\phi \mapsto C_{\phi }\) with the construction \([X]\mapsto C_{[X]}\) given above. Namely, the choice of a contact form \(\omega \) allows to introduce a vector field \(R\in \mathfrak {X}(\mathcal {M})\) (known as the *Reeb vector field*) defined uniquely by the conditions \(\omega (R)=1\) and \(R\lrcorner {\text {d}}\omega =0\). Since *R* is not contained in \(\mathcal {C}=\ker \omega \), its class [*R*] establishes a basis of the normal bundle \(\mathrm {N}\mathcal {C}\). Consequently, we can identify smooth functions on \(\mathcal {M}\) with sections of \(\mathrm {N}\mathcal {C}\), via \(\phi \mapsto [\phi R]\). Now it is not difficult to prove that \(C_{\phi }=C_{[\phi R]}\) and conversely that \(C_{[X]}=C_{\phi }\) for \(\phi =\omega (C_{[X]})=\omega (X)\). The details are left to the reader.

Note, however, that the description of the contact distribution \(\mathcal {C}\) in terms of a contact form \(\omega \) is, in general, non-canonical (as every rescaling of \(\omega \) by a nowhere-vanishing function gives the same kernel \(\mathcal {C}\)) and valid only locally (as there clearly exist contact distributions which cannot be globally presented as kernels of single 1-forms). For this reason, the description of a contact manifold \((\mathcal {M},\mathcal {C})\) in terms of \(\mathcal {C}\) and related objects (e.g., \(\mathrm {N}\mathcal {C}, \beta \)) is more fundamental and often conceptually simpler (for example, in the description of CVFs) than the one in terms of \(\omega \). Not to mention that, for instance, the construction of the CVF \(C_{\phi }\) does depend on the particular choice of \(\omega \), whereas the construction of \(C_{[X]}\) is universal.

### Remark 4

*control-affine system*on a manifold \(\mathcal {M}\) is usually understood as a differential equation of the form

*f*is usually called a

*drift*) and \((u^1,\ldots ,u^m)^T\in \mathbb {R}^m\) are control parameters. Trajectories of the control system (3.1) are integral curves \(\dot{x}(t)\in \mathcal {A}(x(t))\) of the affine distribution

### Proposition 5

Let \((\mathcal {M},\mathcal {C})\) be a contact manifold. There is a one-to-one correspondence between CVFs on \(\mathcal {M}\) and control-affine systems (equivalently, affine distributions) on \(\mathcal {M}\) of the form \(\mathcal {A}=X+\mathcal {C}\subset \mathrm {T}\mathcal {M}\), where \(X\in \varGamma (\mathcal {M})\).

Indeed, to every CVF \(C_{}\), we attach the affine distribution (control-affine system) \(\mathcal {A}=C_{}+\mathcal {C}\). Conversely, given an affine distribution (control-affine system) \(\mathcal {A}=X+\mathcal {C}\), there exists a unique CVF \(C_{}\in \varGamma ( \mathcal {A})\), namely \(C_{}=C_{[X]}\), such that \(\mathcal {A}=C_{}+\mathcal {C}=C_{[X]}+\mathcal {C}\). In other words, on every contact manifold \((\mathcal {M},\mathcal {C})\), there are as many CVF’s *C* as control-affine systems \(\mathcal {A}=X+\mathcal {C}\), the correspondence being established by the map \(\mathcal {A}=X+\mathcal {C}\mapsto C_{[X]}\).

## 4 Contact geometry of \(\mathbb {P}(\mathrm {T}^*M)\)

In this section, we shall describe the natural contact structure on \(\mathbb {P}(\mathrm {T}^*M)\) and its relation with the canonical symplectic structure on \(\mathrm {T}^*M\) (see, e.g., [5] or [14]). Later it will turn out that this structure for \(M=Q\times \mathbb {R}\) plays the crucial role in optimal control theory.

*The canonical contact structure on*\(\mathbb {P}(\mathrm {T}^*M)\)

Let us denote the cotangent bundle of a manifold *M* by \(\pi _M:\mathrm {T}^*M\rightarrow M\). The projectivized cotangent bundle \(\mathbb {P}(\mathrm {T}^*M)\) is defined as the space of equivalence classes of non-zero covectors from \(\mathrm {T}^*M\) with \([\theta ]=[\theta ']\) if \(\pi _M(\theta )=\pi _M(\theta ')\) and \(\theta =a\cdot \theta '\) for some scalar \(a\in \mathbb {R}{\setminus }\{0\}\). Clearly, \(\mathbb {P}(\mathrm {T}^*M)\) is naturally a smooth manifold and also a fiber bundle over *M* with the projection \(\pi :\mathbb {P}(\mathrm {T}^*M)\rightarrow M\) given by \(\pi :[\theta ]\mapsto \pi _M(\theta )\). The fiber of \(\pi \) over \(p\in M\) is simply the projective space \(\mathbb {P}(\mathrm {T}^*_p M)\). It is worth noticing that \(\mathbb {P}(\mathrm {T}^*M)\) can be also understood as the space of hyperplanes in \(\mathrm {T}M\) (a *manifold of contact elements*), where we can identify each point \([\theta ]\in \mathbb {P}(\mathrm {T}^*M)\) with the hyperplane \(\mathcal {H}_{[\theta ]}:=\ker \theta \subset \mathrm {T}_{\pi _M(\theta )}M\).

### Lemma 6

*canonical contact structure on*\(\mathbb {P}(\mathrm {T}^*M)\).

The fact that (4.1) defines a contact structure is well known in the literature. The proof is given, for instance, in Appendix 4 of the book of Arnold [5], where the reasoning is based on the properties of the Liouville 1-form \(\Omega _M\) on the cotangent manifold \(\mathrm {T}^*M\). For convenience of our future considerations in Sect. 5 we shall, however, present a separate proof quite similar to the one of Arnold.

### Proof

*R*will be the Reeb vector field (cf. our considerations following Remark 3). The set \(\mathcal {U}_R=\{[\theta ]\ |\ \theta (R)\ne 0\}\subset \mathbb {P}(\mathrm {T}^*M)\) is an open subset and \(\mathcal {U}_R\) projects under \(\pi \) to the open subset \(\{p\ |\ R(p)\ne 0\}\subset M\). In the language of hyperplanes, \(\mathcal {U}_R\) consists of all hyperplanes in \(\mathrm {T}M\) which are transversal to the given field

*R*. Clearly the collection of subsets \(\mathcal {U}_R\) for all possible vector fields

*R*forms an open covering of \(\mathbb {P}(\mathrm {T}^*M)\). The open subset \(\mathcal {U}_R\subset \mathbb {P}(\mathrm {T}^*M)\) can be naturally embedded as a co-dimension one submanifold in \(\mathrm {T}^*M\) by means of the map

To finish the proof it is enough to check that \(\omega _R\) satisfies the maximum non-degeneracy condition. This can be easily seen by introducing local coordinates \((x^0,x^1,\ldots ,x^n)\) on *M* in which \(R=\partial _{x^0}\) (recall that *R* is non-vanishing on \(\pi (\mathcal {U}_R)\), so such a choice is locally possible). Let \((x^i,p_i)\) be the induced coordinates on \(\mathrm {T}^*M\). It is clear that in these coordinates the image \(i_R(\mathcal {U}_R)\subset \mathrm {T}^*M\) is characterized by equation \(p_0=1\) and thus the Liouville form \(\Omega _M=\sum _{i=0}^np_i{\text {d}}x^i\) restricted to \(i_R(\mathcal {U}_R)\) is simply \({\text {d}}x^0+\sum _{i=1}^np_i{\text {d}}x^i\). Obviously, the pull-back functions \(\widetilde{x}^i:=(i_R)^*x^i\) with \(i=0,\ldots ,n\) and \(\widetilde{p}_j:=(i_R)^*p_j\) with \(j=1,\ldots ,n\) form a coordinate system in \(\mathcal {U}_R\). In these coordinates, the form \(\omega _R=(i_R)^*\Omega _M\) simply reads as \({\text {d}}\widetilde{x}^0+\sum _{i=1}^n\widetilde{p}_i{\text {d}}\widetilde{x}^i\). It is a matter of a simple calculation to check that such a one-form satisfies the maximum non-degeneracy condition. We conclude that \(\omega _R\) is, indeed, a contact form on \(\mathcal {U}_R\) for the canonical contact structure on \(\mathbb {P}(\mathrm {T}^*M)\). \(\square \)

*Contact transformations of*\(\mathbb {P}(\mathrm {T}^*M)\)*induced by diffeomorphisms*

In this paragraph, we will define contact transformations of \(\mathbb {P}(\mathrm {T}^*M)\) which are natural lifts of diffeomorphisms of the base *M*.

### Definition 7

Let \(F:M\rightarrow M\) be a diffeomorphism. Its tangent map \(\mathrm {T}F:\mathrm {T}M\rightarrow \mathrm {T}M\) induces a natural transformation \(\mathbb {P}(F):\mathbb {P}(\mathrm {T}^*M)\rightarrow \mathbb {P}(\mathrm {T}^*M)\) of the space of hyperplanes in \(\mathrm {T}M\), i.e., given a hyperplane \(\mathcal {H}\subset \mathrm {T}_p M\), we define the hyperplane \(\mathbb {P}(F)(\mathcal {H})\subset \mathrm {T}_{F(p)}M\) to be simply the image \(\mathrm {T}F(\mathcal {H})\). The map \(\mathbb {P}(F)\) shall be called the *contact lift of**F*.

### Lemma 7

\(\mathbb {P}(F)\) is a contact transformation with respect to the canonical contact structure on \(\mathbb {P}(\mathrm {T}^*M)\).

### Proof

Let *Y* be an element of \(\mathrm {T}_{[\theta ]}\mathbb {P}(\mathrm {T}^*M)\) projecting to \(\mathrm {T}\pi (Y)=:\underline{Y}\) under \(\mathrm {T}\pi \). By diagram (4.4), the tangent map \(\mathrm {T}\mathbb {P}(F)\) sends *Y* to an element of \(\mathrm {T}_{[(F^{-1})^*\theta ]}\mathbb {P}(\mathrm {T}^*M)\) lying over \(\mathrm {T}F(\underline{Y})\).

*Y*belongs to the contact distribution \(\mathcal {C}_{[\theta ]}\), i.e., see (4.1), if \(\underline{Y}\in \ker \theta \), then

Let us remark that an alternative way to prove the above result is to show that \((F^{-1})^*\) maps the contact form \(\omega _R\) to \(\omega _{\mathrm {T}F(R)}\). To prove that, one uses the fact that the pullback \((F^{-1})^*\) preserves the Liouville form.

*CVFs on*\(\mathbb {P}(\mathrm {T}^*M)\)*induced by base vector fields* The results of the previous paragraph have their natural infinitesimal version.

### Definition 8

Let \(X\in \mathfrak {X}(M)\) be a smooth vector field. By the *contact lift of**X* we shall understand the contact vector field \(\mathbf {C}_{X}\) on \(\mathbb {P}(\mathrm {T}^*M)\) whose flow is \(\mathbb {P}(A_{t})\), the contact lift of the flow \(A_{t}\) of *X*.

The correctness of the above definition is a consequence of a simple observation that the contact lift preserves the composition of maps, i.e., \(\mathbb {P}(F\circ G)=\mathbb {P}(F)\circ \mathbb {P}(G)\) for any pair of maps \(F,G:M\rightarrow M\). It follows that the contact lift of the flow \(A_{t}\) is a flow of contact transformations of \(\mathbb {P}(\mathrm {T}^*M)\) and as such it must correspond to some contact vector field (cf. Definition 6).

An analogous reasoning shows that given a Caratheodory TDVF \(X_t\in \mathfrak {X}(M)\) and the related TD flow \(A_{t\tau }:M\rightarrow M\), the contact lift of the latter, i.e., \(\mathbb {P}(A_{t\tau })\), will consist of contact transformations and will satisfy all the properties of the TD flow. By the results of Proposition 4, \(\mathbb {P}(A_{t\tau })\) is a TD flow associated with some contact TDVF (see also Lemma 2). Obviously, this field is just \(\mathbf {C}_{X_t}\). The justification of this fact is left for the reader.

### Lemma 8

*X*under \(\mathrm {T}\pi \), i.e.,

### Proof

Since \(\mathbb {P}(A_{t})\), the flow of \(\mathbf {C}_{X}\), projects under \(\pi \) to \(A_{t}\), the flow of *X*, we conclude that \(X=\mathrm {T}\pi (\mathbf {C}_{X})\). As we already know from the proof of Lemma 5, a CVF is uniquely determined by its class in \(\mathrm {N}\mathcal {C}\). By (4.1), the \(\mathrm {N}\mathcal {C}\)-class of a field \(Y\in \mathfrak {X}(\mathbb {P}(\mathrm {T}^*M))\) is completely determined by its \(\mathrm {T}\pi \)-projection. In other words, if two fields *Y* and \(Y'\) have the same \(\mathrm {T}\pi \)-projections, then \(Y-Y'\) is a \(\mathcal {C}\)-valued vector field. Thus, the field \(\widetilde{X}\) has the same \(\mathrm {N}\mathcal {C}\)-class as the CVF \(\mathbf {C}_{X}\) so, by the results of Lemma 5 (see also Remark 3), it follows \(\mathbf {C}_{X}=C_{[\widetilde{X}]}\). \(\square \)

### Remark 5

*R*on

*M*and fix a contact form \(\omega _R=(i_R)^*\Omega _M\) on \(\mathcal {U}_R\subset \mathbb {P}(\mathrm {T}^*M)\). Using the results of our previous Sect. 3 and with the help of the contact form \(\omega _R\), the CVF \(\mathbf {C}_{X}\) can be presented as \(C_{\phi }\) for some generating function \(\phi :\mathcal {U}_R\rightarrow \mathbb {R}\). This function is simply the evaluation of \(\omega _R\) at \(\mathbf {C}_{X}\). In fact,

*X*which is being lifted.

### Remark 6

It is worth mentioning the following illustrative picture which was pointed to us by Janusz Grabowski. Every contact structure on a manifold *N* can be viewed as a homogeneous symplectic structure on some principal \(GL(1,\mathbb {R})\)-bundle over *N*. In the case of the canonical contact structure on \(N=\mathbb {P}(\mathrm {T}^*M)\) the corresponding bundle is simply \(\mathrm {T}^*_0 M\), the cotangent bundle of *M* with the zero section removed, equipped with the natural action of \(\mathbb {R}{\setminus }\{0\}=GL(1,\mathbb {R})\) being the restriction of the multiplication by reals on \(\mathrm {T}^*M\). The canonical symplectic structure is obviously homogeneous with respect to this action. Now every homogeneous symplectic dynamics on \(\mathrm {T}_0^*M\) reduces to contact dynamics on \(\mathbb {P}(\mathrm {T}^*M)\). For more information on this approach the reader should consult [7, 9].

## 5 The Pontryagin maximum principle

*The Pontryagin Maximum Principle*A

*control system*on a manifold

*Q*is constituted by a family of vector fields \(f:Q\times U\rightarrow \mathrm {T}Q\) parametrized by a topological space

*U*. It can be understood as a parameter-dependent differential equation

*q*(

*t*) of (CS) is usually called a

*trajectory*of (CS) associated with the

*control*

*u*(

*t*).

*cost function*\(L:Q\times U\rightarrow \mathbb {R}\) allows to consider the following

*optimal control problem*(OCP)

*u*(

*t*)s which are locally bounded and measurable, the time interval [0,

*T*] is fixed and we are considering fixed-end-points boundary conditions

*q*(0) and

*q*(

*T*). By a solution of the optimal control problem we shall understand a pair (

*q*(

*t*),

*u*(

*t*)) satisfying (OCP).

*t*.

^{2}In fact, \(\varvec{q}(t)\) is a trajectory (associated with the same control

*u*(

*t*)) of the following extension of (CS):

with \(\varvec{f}:=(f,L\cdot \partial _{q_0}): \varvec{Q}\times U\rightarrow \mathrm {T}\varvec{Q}=\mathrm {T}Q\times \mathrm {T}\mathbb {R}\). Here, we treat both *f* and *L* as maps from \(\varvec{Q}\times U\) invariant in the \(\mathbb {R}\)-direction in \(\varvec{Q}=Q\times \mathbb {R}\). In other words, we extended (CS) by incorporating the costs \(q_0(t)\) as additional configurations of the system. The evolution of these additional configurations is governed by the cost function *L*. Note that the total cost of the trajectory *q*(*t*) with \(t\in [0,T]\) is precisely \(q_0(T)\). Since the latter is fully determined by the pair (*q*(*t*), *u*(*t*)), it is natural to regard the extended pair \((\varvec{q}(t),u(t))\) rather than (*q*(*t*), *u*(*t*)) as a solution of (OCP).

Note that the extended configuration space \(\varvec{Q}=Q\times \mathbb {R}\ni \varvec{q}=(q,q_0)\) is equipped with the canonical vector field \(\varvec{\partial }_{q_0}:=(0,\partial _{q_0})\in \mathrm {T}\varvec{Q}=\mathrm {T}Q\times \mathrm {T}\mathbb {R}\). We shall denote the distribution spanned by this field by \(\varvec{\mathcal {R}}\subset \mathrm {T}\varvec{Q}\). The ray \(\varvec{\mathcal {R}}^{-}_{\varvec{q}}:=\{-r\varvec{\partial }_{q_0}\ |\ r\in \mathbb {R}_{+}\}\subset \varvec{\mathcal {R}}_{\varvec{q}}\subset \mathrm {T}_{\varvec{q}}\varvec{Q}\) contained in this distribution will be called the *direction of the decreasing cost at*\(\varvec{q}\in \varvec{Q}\).

Regarding technical assumptions, following [18], we shall assume that *U* is a subset of an Euclidean space, *f*(*q*, *u*) and *L*(*q*, *u*) are differentiable with respect to the first variable and, moreover, \(f(q,u), L(q,u), \frac{\partial f}{\partial q}(q,u)\) and \(\frac{\partial L}{\partial q}(q,u)\) are continuous as functions of (*q*, *u*). In the light of Theorem 7 below it is clear that these conditions guarantee that, for any choice of a bounded measurable control *u*(*t*) and any initial condition \(\varvec{q}(0)\), equation (**CS**) has a unique (Caratheodory) solution defined in a neighborhood of 0. It will be convenient to denote the TDVFs \(q\mapsto f(q,u(t))\) and \(\varvec{q}\mapsto \varvec{f}(\varvec{q},u(t))\) related with such a control *u*(*t*) by \(f_{u(t)}\) and \(\varvec{f}_{u(t)}\), respectively. In the language of Sect. 2, technical assumptions considered above guarantee that \(f_{u(t)}\) and \(\varvec{f}_{u(t)}\) are Caratheodory TDVFs. In particular, their TD flows \(F_{t\tau }:Q\rightarrow Q\) and \(\varvec{F}_{t\tau }:\varvec{Q}\rightarrow \varvec{Q}\), respectively, are well-defined families of (local) diffeomorphisms.^{3} Note that if \(\varvec{q}(t)\) with \(t\in [0,T]\) is a solution of (**CS**), then for every \(t,\tau \in [0,T]\) the map \(\varvec{F}_{t\tau }\) is well defined in a neighborhood of \(\varvec{q}(\tau )\).

In the above setting, necessary conditions for the optimality of \((\varvec{q}(t),u(t))\) are formulated in the following PMP

### Theorem 2

*Maximum Principle*

### Definition 9

A pair \((\varvec{q}(t),\widehat{u}(t))\) satisfying the necessary conditions for optimality provided by Theorem 2 (i.e., the existence of a covector curve \(\varvec{\varLambda }_t\) satisfying (5.1)–(5.3)) is called an *extremal*.

### Proof of the PMP

Although the PMP is a commonly known result, for future purposes it will be convenient to sketch its original proof following [18]. \(\square \)

Let \((\varvec{q}(t), \widehat{u}(t))\) be a trajectory of (**CS**). By \(\varvec{F}_{t\tau }: \varvec{Q}\rightarrow \varvec{Q}\), where \(0\le \tau \le t\le T\), denote the TD flow on \(\varvec{Q}\) of the Caratheodory TDVF \(\varvec{f}_{\widehat{u}(t)}\) defined by the control \(\widehat{u}(t)\) (cf. Definition 1). In other words, given a point \(\varvec{q}\in \varvec{Q}\), the curve \(t\mapsto \varvec{F}_{t 0}(\varvec{q})\) is the a trajectory of (**CS**) associated with the control \(\widehat{u}(t)\) with the initial condition \(\varvec{q}(0)=\varvec{q}\).

*needle variations*and the resulting construction of a family of sets

^{4}

*U*and \(\delta t_i\) are arbitrary non-negative numbers. It is easy to see that \(\varvec{\mathcal {K}}_t\) is a closed and convex cone in \(\mathrm {T}_{\varvec{q}(t)}\varvec{Q}\), well defined for each regular point \(t\in (0,T)\) of the control \(\widehat{u}(\cdot )\). What is more, the cones \(\varvec{\mathcal {K}}_t\) are ordered by the TD flow \(\varvec{F}_{t\tau }\), i.e.,

*T*) by setting

The importance of the construction of the cone \(\varvec{\mathcal {K}}_t\) lies in the fact that it approximates the reachable set of the control system (**CS**) at the point \(\varvec{q}(t)\). In particular, it was proved in [18] that if at any point \(t\in [0,T]\), the interior of the cone \(\varvec{\mathcal {K}}_t\) contains the direction of the decreasing cost \(\varvec{\mathcal {R}}^-_{\varvec{q}(t)}\), then the trajectory \(t\mapsto \varvec{q}(t), t\in [0,T]\), cannot be optimal.

### Lemma 9

([18]) If, for any \(0<t\le T\), the ray \(\varvec{\mathcal {R}}^-_{\varvec{q}(t)}\) lies in the interior of \(\varvec{\mathcal {K}}_t\), then \((\varvec{q}(t),\widehat{u}(t))\) cannot be a solution of (OCP).

As a direct corollary, using basic facts about separation of convex sets, one obtains the following

### Proposition 6

([18]) Assume that \((\varvec{q}(t),\widehat{u}(t))\) is a solution of (OCP). Then for each \(t\in (0,T]\) there exists a hyperplane \(\varvec{\mathcal {H}}_t\subset \mathrm {T}_{\varvec{q}(t)}\varvec{Q}\) separating the convex cone \(\varvec{\mathcal {K}}_t\) from the ray \(\varvec{\mathcal {R}}^-_{\varvec{q}(t)}\).

^{5}\(\varvec{\mathcal {H}}_t\subset \mathrm {T}_{\varvec{q}(t)}\varvec{Q}\) separating the cone \(\varvec{\mathcal {K}}_t\) from the ray \(\varvec{\mathcal {R}}^-_{\varvec{q}(t)}\) for each \(t\in (0,T]\). Because of (5.5) and the fact that the canonical vector field \(\varvec{\partial }_{q_0}\) is invariant under \(\mathrm {T}\varvec{F}_{t\tau }\) (the control does not depend on the cost), we may choose \(\varvec{\mathcal {H}}_t\) in such a way that

### Remark 7

**CS**) satisfying the above necessary conditions for optimality (i.e., the existence of a curve of separating hyperplanes \(\varvec{\mathcal {H}}_t\) which satisfies (5.6)) can be classified according to the relative position of the hyperplanes \(\varvec{\mathcal {H}}_t\) and the line field \(\varvec{\mathcal {R}}\subset \mathrm {T}\varvec{Q}\). Note that since the hyperplanes \(\varvec{\mathcal {H}}_t\) evolve according to the TD flow \(\varvec{F}_{t\tau }\) of the TDVF \(\varvec{f}_{\widehat{u}(t)}\), which leaves the distribution \(\varvec{\mathcal {R}}\) invariant, we conclude that whenever \(\varvec{\mathcal {R}}_{\varvec{q}(\tau )}\subset \varvec{\mathcal {H}}_\tau \) at a particular point \(\tau \in [0,T]\), then \(\varvec{\mathcal {R}}_{\varvec{q}(t)}\subset \varvec{\mathcal {H}}_t\) for every \(t\in [0,T]\). We call a trajectory \(\varvec{q}(t)\) of (

**CS**) satisfying the above necessary conditions for optimality:

*normal*if Open image in new window for any \(t\in [0,T]\). Note that, in consequence, the ray \(\varvec{\mathcal {R}}^{-}_{\varvec{q}(t)}\) can be strictly separated from the cone \(\varvec{\mathcal {K}}_t\) for each \(t\in [0,T]\);*abnormal*if \(\varvec{\mathcal {R}}_{\varvec{q}(t)}\subset \varvec{\mathcal {H}}_t\) for each \(t\in [0,T]\);*strictly abnormal*if for some \(t\in [0,T]\) the ray \(\varvec{\mathcal {R}}^-_{\varvec{q}(t)}\) cannot be strictly separated from the cone \(\varvec{\mathcal {K}}_t\) (and thus \(\varvec{\mathcal {R}}_{\varvec{q}(t)}\subset \varvec{\mathcal {H}}_t\) for each \(t\in [0,T]\)).

Note that, as we have already observed in Remark 7, for abnormal solutions, we have \(\varvec{\partial }_{q_0}\in \varvec{\mathcal {H}}_t=\ker \varvec{\lambda }(t)\), and thus \(\big \langle {\varvec{\lambda }(t),\varvec{\partial }_{q_0}}\big \rangle \equiv 0\). For normal solutions it is possible to choose \(\varvec{\lambda }(t)\) in such a way that \(\big \langle {\varvec{\lambda }(t),-\varvec{\partial }_{q_0}}\big \rangle \equiv 1\) along the optimal solution.

*The contact formulation of the PMP* Expressing the essential geometric information of the PMP (see Fig. 2) in terms of hyperplanes \(\varvec{\mathcal {H}}_t\), instead of covectors \(\varvec{\lambda }(t)\), combined with our considerations about the canonical contact structure on \(\mathbb {P}(\mathrm {T}^*M)\) (see Sect. 4) allows to formulate the following contact version of the PMP.

### Theorem 3

Moreover, each \(\varvec{\mathcal {H}}_t\) separates the convex cone \(\varvec{\mathcal {K}}_t\) defined by (5.4) from the ray \(\varvec{\mathcal {R}}^{-}_{\varvec{q}(t)}\).

### Proof

The family of hyperplanes \(\varvec{\mathcal {H}}_t\) separating the cone \(\varvec{\mathcal {K}}_t\) from the ray \(\varvec{\mathcal {R}}^{-}_{\varvec{q}(t)}\) and satisfying (5.6) was constructed in the course of the proof of Theorem 2 sketched in the previous paragraph. To end the proof it is enough to check that \(\varvec{\mathcal {H}}_t\) evolves according to (5.7). From (5.6) and Definition 7 of the contact lift we know that \(\varvec{\mathcal {H}}_t\) evolves according to \(\mathbb {P}(\varvec{F}_{\tau t})\). By the remark following Definition 8 this is precisely the TD flow induced by the TDVF \(\mathbf {C}_{\varvec{f}_{\widehat{u}(t)}}\). \(\square \)

Let us remark that the contact dynamics (5.7) are valid regardless of the fact whether the considered solution is normal or abnormal. We have a unique contact TDVF \(\mathbf {C}_{\varvec{f}_{\widehat{u}(t)}}\) on \(\mathbb {P}(\mathrm {T}^*\varvec{Q})\) governing the dynamics of the separating hyperplanes \(\varvec{\mathcal {H}}_t\). The difference between normal and abnormal solutions lies in the relative position of the hyperplanes \(\varvec{\mathcal {H}}_t\) with respect to the canonical vector field \(-\varvec{\partial }_{q_0}\) on \(\varvec{Q}\).

### Remark 8

Actually, the fact that the evolution of \(\varvec{\mathcal {H}}_t\) is contact (and at the same time that the evolution of \(\varvec{\varLambda }_t\) is Hamiltonian) is in a sense “accidental”. Namely, it is merely a natural contact (Hamiltonian) evolution induced on \(\mathbb {P}(\mathrm {T}^*\varvec{Q})\) (on \(\mathrm {T}^*\varvec{Q}\)) by the TD flow on \(\varvec{Q}\) defined by means of the extremal vector field. In the Hamiltonian case this was, of course, already observed—see, e.g., Chapter 12 in [2]. Thus, it is perhaps more proper to speak rather about *covariant* (in terms of hyperplanes) and *contravariant* (in terms of covectors) *formulations of the PMP*, than about its contact and Hamiltonian versions. It may seem that the choice between one of these two approaches is a matter of a personal taste, yet obviously the covariant formulation is closer to the original geometric meaning of the PMP, as it contains a direct information about the separating hyperplanes, contrary to the contravariant version where this information is translated to the language of covectors (not to forget the non-uniqueness of the choice of \(\varvec{\varLambda }_t\)). In the next Sect. 6 we shall show a few applications of the covariant approach to the sub-Riemannian geometry. Expressing the optimality in the language of hyperplanes allows to see a direct relation between abnormal extremals and special directions in the constraint distribution. It also provides an elegant geometric characterization of normal extremals.

Although Eq. (5.7) has a very clear geometric interpretation it is more convenient to avoid, in applications, calculating the contact lift. Combining (5.6) with Theorem 1 allows to substitute equation (5.7) by a simple condition involving the Lie bracket.

### Theorem 4

### Proof

The proof is immediate. The existence of separating hyperplanes \(\varvec{\mathcal {H}}_t\) satisfying (5.6) was already proved in the course of this section. The only part that needs some attention is the justification of equation (5.8). It follows directly from the \(\varvec{F}_{t\tau }\)-invariance along \(\varvec{q}(t)\) of \(\varvec{\mathcal {H}}_t\) and Theorem 1. (Note that \(\varvec{\mathcal {H}}_t\) is charming along \(\varvec{q}(t)\) by Proposition 1.) \(\square \)

In particular, by choosing \(\varvec{R}=\varvec{\partial }_{q_0}\) we can easily recover the results of [17]. Note that \(\varvec{R}=\varvec{\partial }_{q_0}\) is the canonical choice of a vector field transversal to all hyperplanes \(\varvec{\mathcal {H}}_t\)’s in the normal case (note that additionally \(\varvec{R}=\varvec{\partial }_{q_0}\) is \(\varvec{F}_{t\tau }\)-invariant). For such a choice of \(\varvec{R}\), the corresponding embedding \(i_{\varvec{R}}:\mathcal {U}_{\varvec{R}}\hookrightarrow \mathrm {T}^*\varvec{Q}\) is constructed simply by setting \(\big \langle {\varvec{\lambda },\varvec{\partial }_{q_0}}\big \rangle =1\), which is just the standard normalization of the normal solution. The associated contact form is \(\omega _{\varvec{R}}=\pi _2^*{\text {d}}q_0+\pi _1^*\Omega _Q\), where \(\Omega _Q\) is the Liouville form on \(\mathrm {T}^*Q\) and \(\pi _1:Q\times \mathbb {R}\rightarrow Q\) and \(\pi _2:Q\times \mathbb {R}\rightarrow \mathbb {R}\) are natural projections. As observed above, the generating function of the contact dynamics associated with \(\omega _{\varvec{R}}\) is the linear Hamiltonian (5.2). This stays in a perfect agreement with the results of Sect. 2 in [17].

For the abnormal case there is no canonical choice of the field \(\varvec{R}\) transversal to the separating planes. Yet locally such a choice (but not canonical) is possible. The resulting generating function of the contact dynamics (5.7) is again the linear Hamiltonian (5.2).

## 6 Applications to the sub-Riemannian geodesic problem

In this section, we shall apply our covariant approach to the PMP (cf. Remark 8) to concrete problems of optimal control. We shall concentrate our attention on the SR geodesic problem on a manifold *Q*. Our main idea is to extract, from the geometry of the cone \(\varvec{\mathcal {K}}_t\), as much information as possible about the separating hyperplane \(\varvec{\mathcal {H}}_t\) and then use the contact evolution (in the form (5.6) or (5.8)) to determine the actual extremals of the system.

*A sub-Riemannian geodesic problem* To be more precise we are considering a control system constituted by choosing in the tangent space \(\mathrm {T}Q\) a smooth constant rank distribution \(\mathcal {D}\subset \mathrm {T}Q\). Clearly (locally and non-canonically), by taking \(f(q,u)=\sum _{i=1}^d u^if_i(q)\), where \(u=(u^1,u^2,\ldots ,u^d)\) and \(\mathcal {D}=\big \langle {f_1,\ldots ,f_d}\big \rangle \), we may present \(\mathcal {D}\) as the image of a map \(f:Q\times U\rightarrow \mathrm {T}Q\) where \(U=\mathbb {R}^d\), with \(d:={\text {rank}}\mathcal {D}\), is an Euclidean space, i.e., a control system of type (CS). We shall refer to it as to the *SR control system*. In agreement with our notation from the previous Sect. 5, we will write also \(f_u(q)\) instead of \(f(q,u)\in \mathcal {D}_q\).

*SR geodesic problem*is an optimal control problem of the form (OCP) constituted by considering a cost function \(L(q,u):=\frac{1}{2} g(f_u(q),f_u(q))\), where

*SR metric*) on \(\mathcal {D}\).

### Definition 10

By a *SR extremal* we shall understand a trajectory \((\varvec{q}(t),\widehat{u}(t))\) of the SR control system satisfying the necessary conditions for optimality given by the PMP (in the form provided by Theorem 2 or, equivalently, Theorems 3 or 4).

*The geometry of cones and separating hyperplanes*Observe that the image \(\varvec{f}(\varvec{q},U)\subset \mathrm {T}_{\varvec{q}}\varvec{Q}=\mathrm {T}_qQ\times \mathrm {T}_{q_0}\mathbb {R}\) is a paraboloid (see Fig. 4). The following fact is a simple consequence of (5.4).

### Lemma 10

### Proof

It follows from (5.4) (after taking \(k=1, t_1=t\), and thus \(\varvec{F}_{tt_1}={\text {id}}_{\varvec{Q}}\)) that \(\varvec{\mathcal {K}}_t\) contains every secant ray \(\mathbb {R}_+\cdot \{\varvec{f}_v(\varvec{q}(t))- \varvec{f}_{{\widehat{u}}(t)}(\varvec{q}(t))\}\) of the paraboloid \(\varvec{f}(\varvec{q}(t),U)=\{\varvec{f}_v(\varvec{q}(t))\ |\ f_v(q(t))\in \mathcal {D}_{q(t)}\}\) passing through the point \(\varvec{f}_{{\widehat{u}}(t)}(\varvec{q}(t))\). Using these secant rays we may approximate every tangent ray of the paraboloid \(\varvec{f}(\varvec{q}(t),U)\) passing through \(\varvec{f}_{{\widehat{u}}(t)}(\varvec{q}(t))\) with an arbitrary accuracy. Since \(\varvec{\mathcal {K}}_t\) is closed, it has to contain this tangent ray and, consequently, the whole tangent space of \(\varvec{f}(\varvec{q}(t),U)\) at \(\varvec{f}_{{\widehat{u}}(t)}(\varvec{q}(t))\) (see Fig. 5). The fact that this tangent space is described by equality (6.1) is an easy exercise. \(\square \)

### Remark 9

In general, for an arbitrary control system and an arbitrary cost function, the cone \(\varvec{\mathcal {K}}_t\) contains all secant rays of the image \(\varvec{f}(\varvec{q}(t),U)\) passing through \(\varvec{f}_{\widehat{u}(t)}(\varvec{q}(t))\). Thus, after passing to the limit, the whole tangent cone to \(\varvec{f}(\varvec{q}(t),U)\) at \(\varvec{f}_{\widehat{u}(t)}(\varvec{q}(t))\) is contained in \(\varvec{\mathcal {K}}_t\). If \(\varvec{f}(\varvec{q}(t),U)\) is a submanifold, as it is the case in the SR geodesic problem, this tangent cone is simply the tangent space at \(\varvec{f}_{\widehat{u}(t)}(\varvec{q}(t))\).

Here is an easy corollary from the above lemma and our previous considerations.

### Lemma 11

*q*(

*t*) is subject to the evolution equation

*q*(

*t*), i.e.,

### Proof

To justify the first part of the assertion, observe that if, in a linear space *V*, a hyperplane \(\mathcal {H}\subset V\) supports a cone \(\mathcal {K}\subset V\) which contains a line \(l\subset \mathcal {K}\) (and all these sets contain the zero vector), then necessarily \(l\subset \mathcal {H}\) (each line containing 0 either intersects the hyperplane or is tangent to it). Since, by Lemma 10, \(\varvec{\mathcal {K}}_t\) contains the subspace \(\{Y+g(f_{\widehat{u}(t)},Y)\partial _{q_0}\ |\ Y\in \mathcal {D}_{q(t)}\}\), we conclude that this subspace must lie in \(\varvec{\mathcal {H}}_t\).

It turns out that in some cases the above basic information, suffices to find SR extremals. Let us study the following two examples.

### Example 1

(Riemannian extremals) In the Riemannian case \(\mathcal {D}=\mathrm {T}Q\) is the full tangent space and *g* is a Riemannian metric on *Q*. Let us introduce any connection \(\nabla \) on *Q* compatible with the metric. By \(T_\nabla (X,Y):=\nabla _XY-\nabla _YX-[X,Y]\) denote the torsion of \(\nabla \) (in particular, if we take the Levi–Civita connection \(\nabla =\nabla ^{LC}\), then \(T_{\nabla ^{LC}}\equiv 0\)).

*g*in terms of the chosen metric-compatible connection \(\nabla \) with torsion \(T_\nabla \). In case that \(\nabla =\nabla ^{LC}\) is the Levi–Civita connection, the torsion vanishes and we recover the standard geodesic equation

### Example 2

In the following two subsections we shall discuss normal and abnormal SR extremals in full generality.

### 6.1 Abnormal SR extremals

Our previous considerations allow us to give the following characterization of SR abnormal extremals.

### Theorem 5

- (a)
The pair \((\varvec{q}(t),\widehat{u}(t))\) is an abnormal SR extremal.

- (b)The smallest distribution \(F_{t\tau }\)-invariant along
*q*(*t*) and containing \(\mathcal {D}_{q(t)}\), i.e.,is of rank smaller than \(\dim Q\). Here \(F_{t\tau }\) denotes the TD flow (in$$\begin{aligned} F_{\bullet }(\mathcal {D})_{q(t)}={\text {vect}}_{\mathbb {R}}\{\mathrm {T}F_{t\tau }(Y)\ |\ Y\in \mathcal {D}_{q(\tau )},\, 0\le \tau \le T\} \end{aligned}$$*Q*) of the Caratheodory TDVF \(f_{\widehat{u}(t)}\).

*q*(

*t*).

Note that Theorem 5 reduces the problem of finding abnormal SR extremals to the study of the minimal distribution \(F_{t\tau }\)-invariant along *q*(*t*) and containing \(\mathcal {D}_{q(t)}\). Often, if *q*(*t*) is sufficiently regular, this problem can be solved by the methods introduced in Lemma 3, which are more practical from the computational viewpoint.

### Corollary 1

*X*be a \(C^\infty \)-smooth \(\mathcal {D}\)-valued vector field and let

*q*(

*t*) with \(t\in [0,T]\) be an integral curve of

*X*. Then

*q*(

*t*) is an SR abnormal extremal in the following two (non-exhaustive) situations:

- The distribution spanned by the iterated Lie brackets of
*X*with all possible smooth \(\mathcal {D}\)-valued vector fields, i.e.,is of constant rank$$\begin{aligned} {\text {ad}}^\infty _X(\mathcal {D})=\big \langle {{\text {ad}}_X^k(Y)\ |\ Y\in \varGamma (\mathcal {D}),\; k=0,1,2,\ldots }\big \rangle \end{aligned}$$*r*along*q*(*t*) and \(r<\dim Q\). - There exists a smooth distribution \(\mathcal {B}\supset \mathcal {D}\) on
*Q*of constant co-rank at least one, such that$$\begin{aligned}{}[X,\mathcal {B}]_{q(t)}\subset \mathcal {B}_{q(t)} \quad \text { for any } \; t\in [0,T]. \end{aligned}$$

The above fact follows directly from Theorem 5, Lemma 3 and Theorem 1. In each of the two cases along *q*(*t*) we have a constant rank smooth (and thus charming) distribution which contains \(\mathcal {D}\), is *X*-invariant along *q*(*t*) (and thus by Theorem 1 also \(F_{t\tau }\)-invariant along *q*(*t*)) and of co-rank at least one. Clearly such a distribution must contain \(F_{\bullet }(\mathcal {D})_{q(t)}\), which in consequence also is of co-rank at least one.

### Remark 10

*X*) is not an abnormal SR extremal, yet, as he claims, the distribution \({\text {ad}}^\infty _X(\mathcal {D})\) regarded in the above corollary is of constant rank \(r=4<5\) along this curve. A detailed study of this example reveals, however, that along the investigated curve, \(r=4\) apart from the point (0, 0, 0, 0, 0), where the rank drops down to 3. Thus, the discussed example does not contradict Corollary 1, as the regularity condition is not matched. In fact, the considered curve consists of two pieces of abnormal SR extremals (for \(t>0\) and \(t<0\)) which do not concatenate to a single SR abnormal extremal, even though the concatenation is \(C^\infty \)-smooth. This example shows that the condition \(r <\dim Q\) in Corollary 1 is not sufficient (although it is necessary in the smooth case).

### Proof of Theorem 5

If \((\varvec{q}(t),\widehat{u}(t))\) is an abnormal extremal then, by the results of Lemma 11, \(\mathcal {H}_t\), the \(\mathrm {T}Q\)-projection of the curve of supporting hyperplanes \(\varvec{\mathcal {H}}_t\subset \mathrm {T}_{\varvec{q}(t)} \varvec{Q}\), is a curve of hyperplanes along *q*(*t*) (i.e., a distribution of co-rank one along *q*(*t*)), it contains \(\mathcal {D}_{q(t)}\) and is \(F_{t\tau }\)-invariant along *q*(*t*). In particular, it must contain the smallest distribution \(F_{t\tau }\)-invariant along *q*(*t*) and containing \(\mathcal {D}\) (cf. Proposition 2). Thus, \({\text {rank}}F_{\bullet }(\mathcal {D})_{q(t)}\le {\text {rank}}\mathcal {H}_t=\dim Q-1\).

Conversely, assume that \({\text {rank}}F_{\bullet }(\mathcal {D})_{q(t)}<\dim Q\). Now by adding (if necessary) to \(F_{\bullet }(\mathcal {D})_{q(t)}\) several vector fields of the form \(F_{tt_0}(X)\) where \(X\in T_{q(0)}Q\), we can extend \(F_{\bullet }(\mathcal {D})_{q(t)}\) to \(\mathcal {H}_t\), a co-rank one distribution \(F_{t\tau }\)-invariant along *q*(*t*). Define now the curve of hyperplanes \(\varvec{\mathcal {H}}_t:=\mathcal {H}_t\oplus \mathcal {R}_{q_0(t)}\subset \mathrm {T}_{\varvec{q}(t)}\varvec{Q}\). We claim that \(\varvec{\mathcal {H}}_t\) is a curve of supporting hyperplanes described in the assertion of Theorem 3. Indeed, the \(\varvec{F}_{t\tau }\)-invariance of \(\varvec{\mathcal {H}}_t\) should be clear, as on the product \(\varvec{Q}=Q\times \mathbb {R}\) the TD flow \(\varvec{F}_{t\tau }\) takes the form \(\varvec{F}_{t\tau }(q,q_0)=(F_{t\tau }(q), B_{t\tau }(q_0))\), for some TD flow \(B_{t\tau }\) on \(\mathbb {R}\). Clearly, since \(\mathcal {H}_t\) is \(F_{t\tau }\)-invariant along *q*(*t*), the tangent map of \(\varvec{F}_{t\tau }\) preserves \(\varvec{\mathcal {H}}_t=\mathcal {H}_t\oplus \mathcal {R}_{q_0(t)}\). To prove that \(\varvec{\mathcal {H}}_t\) indeed separates the cone \(\varvec{\mathcal {K}}_t\) from the direction of the decreasing cost \(\varvec{\mathcal {R}}^-_{\varvec{q}(t)}\) observe that any vector of the form \(\varvec{f}_{v}(\varvec{q}(t))-\varvec{f}_{\widehat{u}(t)}(\varvec{q}(t))\), where \(f_{v}\in \mathcal {D}_{q(t)}\), lies in \(\mathcal {D}_{q(t)}\oplus \mathcal {R}_{q_0(t)}\subset \varvec{\mathcal {H}}_t\). Moreover, any vector of the form \(\mathrm {T}\varvec{F}_{t\tau }\left[ \varvec{f}_v(\varvec{q}(\tau ))-\varvec{f}_{\widehat{u}(t)}(\varvec{q}(\tau ))\right] \), where \(f_{v}\in \mathcal {D}_{q(\tau )}\), lies in \(\mathrm {T}\varvec{F}_{t\tau }(\mathcal {D}_{q(\tau )}\oplus \mathcal {R}_{q_0(\tau )})\subset \mathrm {T}\varvec{F}_{t\tau }(\varvec{\mathcal {H}}_\tau )\subset \varvec{\mathcal {H}}_t\). Thus, the whole cone \(\varvec{\mathcal {K}}_t\) is contained in \(\varvec{\mathcal {H}}_t\) (cf. formula (5.4)). Since also \(\varvec{\mathcal {R}}^-_{\varvec{q}(t)}\subset \varvec{\mathcal {R}}_{\varvec{q}(t)}\subset \varvec{\mathcal {H}}_t\), we conclude that indeed \(\varvec{\mathcal {H}}_t\) separates \(\varvec{\mathcal {K}}_t\) from \(\varvec{\mathcal {R}}^-_{\varvec{q}(t)}\) (in a trivial way).

*q*(

*t*) as the \(f_{\widehat{u}(t)}\)-invariance of the latter distribution, and then use Lemma 4 (for \(\mathcal {B}_{q(t)}\supset \mathcal {D}_{q(t)}\ni f_{\widehat{u}(t)}(q(t))\)) to prove that this invariance depends on \(f_{\widehat{u}(t)}\) and \(F_{\bullet }(\mathcal {D})_{q(t)}\) along

*q*(

*t*) only. Now it is enough to check that \(F_{\bullet }(\mathcal {D})_{q(t)}\) itself does not depend on a particular choice of the extension of \(f_{\widehat{u}(t)}(q(t))\) to a neighborhood of

*q*(

*t*). Assume thus that \(f_{\widehat{u}'(t)}\) is another extension of \(f_{\widehat{u}(t)}(q(t))\), that \(F_{t\tau }'\) is the related TD flow, and that \(F_{\bullet }'(\mathcal {D})_{q(t)}\) is the minimal distribution \(F_{t\tau }'\)-invariant along

*q*(

*t*) and containing \(\mathcal {D}_{q(t)}\). Now repeating the reasoning from the proof of Lemma 4 we would get

*q*(

*t*). From the minimality of \(F_{\bullet }'(\mathcal {D})_{q(t)}\) we conclude that \(F_{\bullet }'(\mathcal {D})_{q(t)}\subset F_{\bullet }(\mathcal {D})_{q(t)}\). Yet, for intertwined \(f_{\widehat{u}(t)}\) and \(f_{\widehat{u}'(t)}\) we would get the opposite inclusion in an analogous manner. Thus, \(F_{\bullet }(\mathcal {D})_{q(t)}=F_{\bullet }'(\mathcal {D})_{q(t)}\), and so it does not depend on the choice of the extension of \(f_{\widehat{u}(t)}\). This ends the proof. \(\square \)

*Examples*

### Example 3

*Y*,

*Z*be a local basis of sections of \(\mathcal {D}\). From the form of the growth vector we conclude that the fields

*Y*,

*Z*and [

*Y*,

*Z*] are linearly independent, while the distribution

*Y*, [

*Y*,

*Z*]] and [

*Z*, [

*Y*,

*Z*]] are linearly dependent relative to the distribution \(\big \langle {Y,Z,[Y,Z]}\big \rangle \), i.e., there exist smooth functions \(\phi ,\psi :Q\rightarrow \mathbb {R}\) such that

### Example 4

(Zelenko) The following example by Igor Zelenko [21] became known to us thanks to the lecture of Boris Doubrov. The interested reader may consult also [3, 8].

*M*with a 2-dimensional distribution \(\mathcal {B}\subset \mathrm {T}M\) of type (2, 3, 5). That is, locally \(\mathcal {B}\) is spanned by a pair of vector fields \(X_1\) and \(X_2\) such that

*M*. Introduce an affine chart [1 :

*t*] corresponding to the line \(\mathbb {R}\cdot \{X_1+tX_2\}\) on fibers of \(Q\rightarrow M\) and define a 2-dimensional distribution \(\mathcal {D}:=\big \langle {\partial _t,X_1+tX_2}\big \rangle \) on

*Q*. Our goal is to find abnormal SR extremals for this distribution. We will use Corollary 1 for this purpose.

First let us show that the integral curves of \(\partial _t\) are abnormal extremals. Indeed, it is easy to see that \([\partial _t,X_1+t X_2]=X_2\) and that \([\partial _t,X_2]=0\), i.e., the minimal distribution \({\partial _t}\)-invariant and containing \(\mathcal {D}\) is precisely \(\big \langle {\partial _t,X_1,X_2}\big \rangle \). This distribution is of constant rank smaller than \(6=\dim Q\), so by Corollary 1, indeed, the integral curves of \(\partial _t\) are abnormal extremals.

*F*is some, a priori unknown, function on

*Q*. To calculate the minimal distribution

*H*-invariant and containing \(\mathcal {D}\) it is enough to consider iterated Lie brackets \({\text {ad}}^k_H(\partial _t)\). Skipping some simple calculations one can show that the vector fields

*Q*. Denote \([X_i,X_j]:=\sum _{k=1}^5f^k_{ij} X_k\) for \(i,j=1,\ldots ,5\). Then the Lie bracket \({\text {ad}}^4_H(\partial _t)\) belongs to \(\widetilde{\mathcal {D}}\) if and only if

*H*-invariant). Since \({\text {rank}}\widetilde{D}=5< \dim Q\), by Corollary 1 the integral curves of

*H*(for

*F*as above) are abnormal SR extremals related with \(\mathcal {D}\).

### Example 5

*strongly bracket generating*(SBG) if for any \(p\in Q\) and any \(X\in \varGamma (\mathcal {D})\) non-vanishing at

*p*we have

*p*.

### Example 6

*S*is an abnormal extremal. Indeed, in this case \(\mathrm {T}_{q(t)}S\) is obviously a charming distribution \(F_{t\tau }\)-invariant along

*q*(

*t*) which contains \(\mathcal {D}_{q(t)}\) and is of co-rank at least one in \(\mathrm {T}_{q(t)}Q\). Thus,

*q*(

*t*) is an abnormal extremal.

### Example 7

(Zhitomirskii) Let \(\mathcal {D}\) be a 2-distribution on a manifold *Q* such that \(\mathcal {D}^2:=\mathcal {D}+[\mathcal {D},\mathcal {D}]\) is of rank 3. In [22] Zhitomirskii introduced the following definition.

*nice with respect to*\(\mathcal {D}\) if

\(\mathcal {Z}\) is involutive

for any \(q\in Q\) we have \(\mathcal {D}_q{\not \subseteq } \mathcal {Z}_q\)

\({\text {rank}}(\mathcal {D}^2\cap \mathcal {Z})=2\).

*Q*. Clearly \(\mathcal {D}\subset \mathcal {H}\) and, what is more, given any section \(X\in \varGamma (\mathcal L)\) we have \([X,\mathcal {H}]\subset \mathcal {H}\). Indeed, take any \(\mathcal {H}\)-valued vector field

*Y*. Since \(\mathcal {H}=\mathcal Z+\mathcal {D}\) we can decompose it (in a non-unique way) as \(Y=Y_1+Y_2\) where \(Y_1\in \varGamma (\mathcal Z)\) and \(Y_2\in \varGamma (\mathcal {D})\). Now \([X,Y]=[X,Y_1]+[X,Y_2]\). Clearly \([X,Y_1]\in \varGamma (\mathcal Z)\), since

*X*and \(Y_1\) are \(\mathcal Z\)-valued and \(\mathcal Z\) is involutive. Moreover \([X,Y_2]\in \varGamma (\mathcal {D}^2)\), as both

*X*and \(Y_2\) are \(\mathcal {D}\)-valued. We conclude that \([X,Y]=[X,Y_1]+[X,Y_2]\in \varGamma (\mathcal Z+\mathcal {D}^2)=\varGamma (\mathcal {H})\).

Now it should be clear that the smallest distribution containing \(\mathcal {D}\) and invariant with respect to the TD flow of *Y* is contained in \(\mathcal {H}\), which is of co-rank one. Thus, by Theorem 5, the integral curves of *X* are abnormal SR extremals.

### 6.2 Normal SR extremals

*q*(

*t*). This assumption allows for an elegant geometric characterization of normal SR extremals in terms of the distribution

*g*-orthogonal to \(f_{\widehat{u}(t)}\) along

*q*(

*t*). Note that \(\mathcal {D}^\perp _{q(t)}\) is a subdistribution of \(\mathcal {D}\) along

*q*(

*t*).

### Theorem 6

- (a)
The pair \((\varvec{q}(t),\widehat{u}(t))\) is a normal SR extremal.

- (b)The velocity \(f_{u(t)}(q(t))\) is of class ACB with respect to
*t*, and the smallest distribution \(F_{t\tau }\)-invariant along*q*(*t*) and containing \(\mathcal {D}^\perp _{q(t)}\), i.e.,does not contain \(f_{\widehat{u}(t)}(q(t))\) for any \(t\in [0,T]\). Here \(F_{t\tau }\) denotes the TD flow (in$$\begin{aligned} F_{\bullet }(\mathcal {D}^\perp )_{q(t)}={\text {vect}}_{\mathbb {R}}\{\mathrm {T}F_{t\tau }(Y)\ |\ Y\in \mathcal {D}_{q(\tau )}, \; g(Y,f_{\widehat{u}(t)})=0, \; 0\le \tau \le T\} \end{aligned}$$*Q*) of the Caratheodory TDVF \(f_{\widehat{u}(t)}\).

Theorem 3.1 of [4] contains a formulation of the above result equivalent to ours.

Again if *q*(*t*) is sufficiently regular we can use the method introduced in Lemma 3 to check condition (b) in the above theorem. The result stated below can be easily derived from Theorem 6 using similar arguments as in the proof of Corollary 1. For the case \({\text {rank}}\mathcal {D}=2\) it was proved as Theorem 6 in [16].

### Corollary 2

*X*be a \(C^\infty \)-smooth \(\mathcal {D}\)-valued vector field and let

*q*(

*t*) with \(t\in [0,T]\) be an integral curve of

*X*. Then

*q*(

*t*) is an SR normal extremal in the following two (non-exhaustive) situations:

- The distribution spanned by the iterated Lie brackets of
*X*and all possible smooth \(\mathcal {D}\)-valued vector fields*g*-orthogonal to X, i.e.,is of constant rank$$\begin{aligned} {\text {ad}}^\infty _X(\mathcal {D}^\perp )=\big \langle {{\text {ad}}_X^k(Y)\ |\ Y\in \varGamma (\mathcal {D}), \; g(X,Y)=0, \; k=0,1,2,\ldots }\big \rangle \end{aligned}$$*r*along*q*(*t*) and it does not contain*X*(*q*(*t*)) for any \(t\in [0,T]\). - There exists a smooth distribution \(\mathcal {B}\) on
*Q*, such that$$\begin{aligned}{}[X,\mathcal {B}]_{q(t)}\subset \mathcal {B}_{q(t)}, \quad X(q(t))\notin \mathcal {B}_{q(t)} \quad \text { and } \quad \mathcal {D}^\perp _{q(t)}\subset \mathcal {B}_{q(t)} \quad \text { for any } \; t\in [0,T]. \end{aligned}$$

### Proof of Theorem 6

The fact that property (6.5) is equivalent to \(t\mapsto f_{u(t)}(q(t))\) being of class ACB follows directly from Lemma 12. Indeed, the field \(f_{u(t)}\) satisfies \([f_{u(t)},f_{u(t)}]_{q(t)}=0\) a.e. Thus, it is ACB along *q*(*t*) if and only if it is respected by the flow \(F_{t\tau }\).

*t*is a hyperplane transversal to the line \(\mathcal {R}_{\varvec{q}(t)}\subset \mathrm {T}_{\varvec{q}(t)} \varvec{Q}\), it must be of the form

*q*(

*t*) which is \(F_{t\tau }\)-invariant, contains \(D^\perp _{q(t)}\) and is transversal to \(f_{u(t)}(q(t))\). Clearly, \(F_{\bullet }(D^\perp )_{q(t)}\subset \ker \alpha _t\) and thus it is also transversal to \(f_{u(t)}(q(t))\).

To prove that \(t\mapsto f_{u(t)}(q(t))\) is ACB, observe first that \(D^\perp _{q(t)}=\ker \alpha _t\cap D_{q(t)}\) admits locally a *g*-orthonormal basis of ACB sections. Indeed, \(\ker \alpha _t\) is charming since it is \(F_{t\tau }\)-invariant (cf. Proposition 1). Let now \(\{X_1,\ldots ,X_{n-1}\}\) be a local basis of ACB sections of \(\ker \alpha _t\) along *q*(*t*). Choose a minimal subset of this basis, say \(\{X_1,\ldots ,X_s\}\), such that \(\big \langle {X_1,\ldots ,X_s}\big \rangle _{q(t)}\oplus \mathcal {D}^\perp _{q(t)}=\ker \alpha _t\) for every *t* in a relatively compact neighborhood of a given point \(t_0\in [0,T]\). Extend locally the SR metric *g* to a metric \(\widetilde{g}\) on \(\ker \alpha _t\) by taking \(\widetilde{g}\big |_{\mathcal {D}^\perp _{q(t)}}=g\big |_{\mathcal {D}^\perp _{q(t)}}\) and by setting vectors \(X_1,\ldots ,X_s\) to be \(\widetilde{g}\)-orthonormal and \(\widetilde{g}\)-orthogonal to \(\mathcal {D}^\perp _{q(t)}\). Clearly, this new metric is ACB in the considered neighborhood of \(t_0\). Now we can apply Lemma 13 to the ACB basis \(\{X_1,\ldots ,X_{n-1}\}\) and obtain an ACB \(\widetilde{g}\)-orthonormal basis \(\{X_1,\ldots ,X_s,Y_{s+1},\ldots ,Y_{n-1}\}\) of \(\ker \alpha _t\). Clearly, by the construction of the Gram–Schmidt algorithm, \(\{Y_{s+1},\ldots ,Y_{n-1}\}\) is a \(\widetilde{g}\)-, and thus also a *g*-orthonormal basis of \(\mathcal {D}^\perp _{q(t)}\) (the relative compactness of the neighborhood is used to assure that the \(\widetilde{g}\)-lengths of sections \(X_i\) are separated from zero).

Now let us choose any ACB section \(Y_n\) of \(\mathcal {D}_{q(t)}\) which is transversal to \(\mathcal {D}^\perp _{q(t)}\). Again using Lemma 13 we modify the ACB local basis \(\{Y_{s+1},\ldots ,Y_{n-1},Y_n\}\) of \(\mathcal {D}_{q(t)}\) to a *g*-orthonormal ACB local basis \(\{Y_{s+1},\ldots ,Y_{n-1},\widetilde{Y}_n\}\). Obviously, \(\widetilde{Y}_n(q(t))\) is a *g*-normalized vector *g*-orthogonal to \(\mathcal {D}^\perp _{q(t)}=\big \langle {Y_{s+1},\ldots ,Y_{n-1}}\big \rangle \), thus \(\widetilde{Y}_n(q(t))=\pm f_{u(t)}(q(t))\). Now \(\alpha _t(\widetilde{Y}_n(q(t)))=\pm \alpha _t(f_{u(t)}(q(t)))=\pm 1\). And since both \(\alpha _t\) and \(\widetilde{Y}_n(q(t))\) are continuous with respect to *t* the sign ± must be constant along [0, *T*]. We conclude that \(t\mapsto f_{u(t)}(q(t))\) is ACB alike \(t\mapsto \widetilde{Y}_n(q(t))\).

Conversely, assume that (b) holds. The crucial step is to build, along the projected trajectory \(q(t)\in Q\), a splitting \(\mathrm {T}_{q(t)} Q=\mathcal {B}_{q(t)}\oplus \big \langle {f_{\widehat{u}(t)}(q(t))}\big \rangle \), where \(\mathcal {B}_{q(t)}\) is a co-rank one distribution along *q*(*t*), which is \(F_{t\tau }\)-invariant along *q*(*t*) and contains \(\mathcal {D}^\perp _{q(t)}\). Such a \(\mathcal {B}_{q(t)}\) can be constructed by adding, if necessary, to \(F_{\bullet }(\mathcal {D}^\perp )_{q(t)}\) several vector fields of the form \(F_{t0}(X_{i})\), where \(X_{i}\in \mathrm {T}_{q(0)}Q\) together with \(F_{\widehat{u}(0)}(q(0))\) are independent. Clearly, in this way we can build \(\mathcal {B}_{q(t)}\) which is \(F_{t\tau }\)-invariant along *q*(*t*), of co-rank one and contains \(\mathcal {D}^\perp _{q(t)}\). What is more, since \(f_{u(t)}(q(t))\) is \(F_{t\tau }\)-invariant, the flow \(F_{t\tau }\) respects the splitting \(\mathcal {B}_{q(t)}\oplus \big \langle {f_{\widehat{u}(t)}(q(t))}\big \rangle \).

*A remark on smoothness of normal SR geodesics*As was proved above normal SR extremals are \(C^1\)-smooth (and even more: their derivatives are ACB maps). It is worth discussing geometric reasons for this regularity in a less technical manner than in the proof of Theorem 6. Let \(\varvec{q}(t)\) be such an extremal and let \(\varvec{\mathcal {H}}_t\) be the corresponding curve of supporting hyperplanes. As we know from Lemma 11

*q*(

*t*). Indeed, since \(\varvec{\mathcal {H}}_t\) is \(\varvec{F}_{t\tau }\)-invariant it must be continuous. Note that by the continuity of \(\varvec{\mathcal {H}}_t\), the limit subspaces \(\mathcal {D}^\perp _{q(t_0)\pm }\oplus 0\cdot \partial _{q_0}\) coming from both sides of a given point \(t_0\in [0,T]\) must belong to \(\varvec{\mathcal {H}}_{t_0}\). Now if \(\varvec{q}(t)\) had a corner-type singularity at \(t_0\), these limit subspaces would be different and thus they would span together the whole space \(\mathcal {D}_{q(t_0)}\oplus 0\cdot \partial _{q_0}\) (cf. Fig. 6). In particular, \(f_{u(t_0)}(q(t_0))+0\cdot \partial _{q_0}\in \mathcal {D}_{q(t_0)}\oplus 0\cdot \partial _{q_0}\) would belong to \(\varvec{\mathcal {H}}_{t_0}\). Yet, since \(f_{u(t_0)}(q(t_0))+\partial _{q_0}\in \varvec{\mathcal {H}}_{t_0}\), this would mean that also the difference of the latter vectors, \(0+\partial _{q_0}\), lies in \(\varvec{\mathcal {H}}_{t_0}\), which is impossible since \(\varvec{q}(t)\) is normal.

In a similar way one deals with a cusp-type singularity. At a cusp we would have limit vectors \(\pm f_{u(t_0)}(q(t_0))+\partial _{q_0}\) in \(\varvec{\mathcal {H}}_{t_0}\) (see Fig. 6). Now \(0+2\partial _{q_0}\), the sum of these two vectors, would belong to \(\varvec{\mathcal {H}}_{t_0}\) which contradicts the normality of the extremal. Roughly speaking, the existence of singularities of corner-type or cusp-type implies \(\partial _{q_0}\in \mathcal {H}_{t_0}\), i.e., either a trajectory is not an extremal or it is abnormal.

*Examples*

### Example 8

(Geodesic equation revisited) Theorem 6 provides an alternative way to derive the geodesic equation in the Riemannian case (i.e., when \(\mathcal {D}=\mathrm {T}Q\)). Let \((\varvec{q}(t),\widehat{u}(t))\) be a trajectory of the SR control system (we shall assume that \(f_{\widehat{u}(t)}\) is normalized). Since \(\mathcal {D}=\mathrm {T}Q\), by the assertion of Theorem 5, in the Riemannian case there are no abnormal extremals.

*q*(

*t*) containing \(\mathcal {D}^\perp _{q(t)}\) is \(\mathrm {T}_{q(t)} Q\) which contains also \(f_{\widehat{u}(t)}(q(t))\). Now, by Theorem 6, \((\varvec{q}(t),\widehat{u}(t))\) is a normal extremal if and only if \(F_{\bullet }(\mathcal {D}^\perp )_{q(t)}=\mathcal {D}^\perp _{q(t)}\), i.e., if

*Y*, after introducing a metric-compatible connection as in Example 1, we have

### Example 9

*Y*and

*Z*form an orthonormal basis. Such a system is usually called the

*Heisenberg system*. It is easy to check that the system in question is strongly bracket generating (cf. Example 5) and as such does not admit any abnormal SR extremal. Our goal will thus be to determine the normal SR extremals using the results of Theorem 6.

*q*(

*t*) of

*X*is an SR normal extremal if and only if \(F_{\bullet }(\mathcal {D}^\perp )_{q(t)}\) does not contain

*X*at any point

*q*(

*t*). Clearly distribution \(F_{\bullet }(\mathcal {D}^\perp )_{q(t)}\), being \({\text {ad}}_X\)-invariant, contains the fields \(X', [X,X'], [X,[X,X']]\), etc. Skipping some simple calculations one can show that

*X*if and only if \(X(\alpha )\ne 0\). Thus, a necessary condition for an integral curve

*q*(

*t*) of

*X*to be a normal SR extremal is that \(\alpha =const\) along

*q*(

*t*). Note that if \(X(\alpha )=const\), then the integral curves of

*X*will indeed be normal SR extremals, as then \([X,[X,X']]=\beta [X,X']+X(\beta ) X'\) and, consequently, \(F_{\bullet }(\mathcal {D}^\perp )_{q(t)}\) will be equal to the 2-dimensional distribution \(\big \langle {X',[X,X']}\big \rangle \) which does not contain

*X*(cf. Corollary 2).

*X*, leads to

*q*(

*t*)), and for \(\alpha \ne 0\) we get \(x={\text {arctan}}(\alpha t+\gamma )\) (i.e., \(\phi =\cos (\alpha t+\gamma )\) and \(\psi =\sin (\alpha t+\gamma )\)). This corresponds to the two well-known families of normal SR extremals of the Heisenberg system (see Sect. 2 of [16]), whose projections to the (

*x*,

*y*)-plane are straight lines and circles, respectively.

## Footnotes

- 1.
Sometimes it is convenient to identify a TDVF \(X_t\) on

*M*with the vector field \(\widetilde{X}(x,t)=X_t(x)+\partial _t\) on \(M\times \mathbb {R}\). Within this identification Eq. (2.1) is an*M*-projection of the autonomous ODE \((\dot{x},\dot{t})=\widetilde{X}(x,t)\) defined on \(M\times \mathbb {R}\). - 2.
From now on, geometric objects and constructions associated with the extended configuration space \(\varvec{Q}\) will be denoted by bold symbols, e.g., \(\varvec{f}, \varvec{q}, \varvec{F}_{tt_0}, \varvec{\mathcal {H}}_t\) etc. Normal-font symbols, e.g., \(f, q, F_{tt_0}, \mathcal {H}_t\), will denote analogous objects in

*Q*being, in general, projections of the corresponding objects from \(\varvec{Q}\). - 3.
From now on we will use symbols \(\varvec{F}_{t\tau }\) and \(F_{t\tau }\) to denote the TD flows of TDVFs \(\varvec{f}_{u(t)}\) and \(f_{u(t)}\), respectively, for a particular control

*u*(*t*). Note that \(\varvec{F}_{t\tau }\) projects to \(F_{t\tau }\) under \(\pi _1:\varvec{Q}=Q\times \mathbb {R}\rightarrow Q\). - 4.
In the original proof in [18], the optimal control problem with a free time interval [0,

*T*] is considered. In this case, the sets \(\varvec{\mathcal {K}}_t\) contain additional elements. - 5.
By choosing \(\varvec{\mathcal {K}}_0:=\{0\}\) we can easily extend \(\varvec{\mathcal {H}}_t\) to the whole interval [0,

*T*]. - 6.
We leave the proof of the fact that the Lie bracket (2.3) satisfies the Leibniz rule as an exercise.

- 7.
The existence of measurable functions \(\phi _i^{\ j}\) can be justified in a similar manner to the existence of ACB functions \(\phi _i\) above.

## Notes

### Acknowledgments

This research was supported by the National Science Center under the Grant DEC-2011/02/A/ST1/00208 “Solvability, chaos and control in quantum systems”.

### References

- 1.Agrachev AA, Barilari D, Boscain U (2012) Introduction to Riemannian and sub-Riemannian geometry. SISSA 9:1–331 (Preprint)Google Scholar
- 2.Agrachev AA, Sachkov YL (2004) Control theory from the geometric viewpoint, Encyclopaedia Math. Sci., vol 87. Springer, BerlinCrossRefMATHGoogle Scholar
- 3.Agrachev AA, Zelenko I (2006) Nurowski’s conformal structures for (2,5)-distributions via dynamics of abnormal extremals. In: Proceedings of RIMS symposium on “Developments of Cartan Geometry and Related Mathematical Problems”, pp 204–218. “RIMS Kokyuroku” series 1502Google Scholar
- 4.Alcheikh M, Orro P, Pelletier F (1997) Characterizations of Hamiltonian geodesics in sub-Riemannian geometry. J Dyn Control Syst 3:391–418MathSciNetCrossRefMATHGoogle Scholar
- 5.Arnold VI (1989) Mathematical methods of classical mechanics. Graduate texts in mathematics. Springer, BerlinCrossRefGoogle Scholar
- 6.Bressan A, Piccoli B (2004) Introduction to the mathematical theory of control, AIMS series on applied mathematics, vol 2. Springer, BerlinGoogle Scholar
- 7.Bruce AJ, Grabowska K, Grabowski J (2015) Remarks on contact and Jacobi geometry. arXiv:1507.05405 [math-ph]
- 8.Doubrov B, Zelenko I (2012) Prolongation of quasi-principal frame bundles and geometry of flag structures on manifolds. arXiv:1210.7334 [math.DG]
- 9.Grabowski J (2013) Graded contact manifolds and contact Courant algebroids. J Geom Phys 68:27–58MathSciNetCrossRefMATHGoogle Scholar
- 10.Grabowski J, Jóźwikowski M (2011) Pontryagin maximum principle—a generalization. SIAM J Control Optim 49:1306–1357MathSciNetCrossRefMATHGoogle Scholar
- 11.Jafarpour S, Lewis AD (2014) Time-varying vector fields and their flows. Springer briefs in mathematics. Springer, BerlinGoogle Scholar
- 12.Jakubczyk B, Kryński W, Pelletier F (2009) Characteristic vector fields of generic distributions of corank 2. Ann. Inst. H. Poincaré Anal. Non Linéaire 26:23–38MathSciNetCrossRefMATHGoogle Scholar
- 13.Lewis AD (2006) The Maximum Principle of Pontryagin in control and in optimal control. Handouts for the course taught at the Universitat Politecnica de CatalunyaGoogle Scholar
- 14.Libermann P, Marle CM (1987) Symplectic geometry and analytical mechanics, mathematics and its applications, vol 35. Springer, BerlinCrossRefMATHGoogle Scholar
- 15.Liberzon D (2012) Calculus of variations and optimal control theory: a concise introduction. Princeton University Press, PrincetonMATHGoogle Scholar
- 16.Liu W, Sussmann HJ (1995) Shortest paths for sub-Riemannian metrics on rank-two distributions, Memoirs of the American Mathematical Society, vol 564. American Mathematical Society, ProvidenceGoogle Scholar
- 17.Ohsawa T (2015) Contact geometry of the Pontryagin maximum principle. Autom. J. IFAC 55:1–5MathSciNetCrossRefGoogle Scholar
- 18.Pontryagin LS, Mishchenko EF, Boltyanskii VG, Gamkrelidze RV (1962) The mathematical theory of optimal processes. Wiley, New YorkGoogle Scholar
- 19.Sussmann HJ (1973) Orbits of families of vector fields and integrability of distributions. Trans. Am. Math. Soc. 180:171–188MathSciNetCrossRefMATHGoogle Scholar
- 20.Sussmann HJ (1998) An introduction to the coordinate-free maximum principle. In: Jakubczyk B, Respondek W (eds) Geometry of feedback and optimal control. Monographs and textbooks in pure and applied mathematics, vol 207. Marvel Dekker, New YorkGoogle Scholar
- 21.Zelenko I (2006) Fundamental form and Cartan’s tensor of (2,5)-distributions coincide. J Dyn Control Syst 12:247–276MathSciNetCrossRefMATHGoogle Scholar
- 22.Zhitomirskii M (1995) Rigid and abnormal line subdistributions of 2-distributions. J Dyn Control Syst 1:253–294MathSciNetCrossRefMATHGoogle Scholar

## Copyright information

**Open Access**This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.