Control on the Manifolds of Mappings with a View to the Deep Learning

Agrachev, Andrei; Sarychev, Andrey

doi:10.1007/s10883-021-09561-2

Control on the Manifolds of Mappings with a View to the Deep Learning

Open access
Published: 14 August 2021

Volume 28, pages 989–1008, (2022)
Cite this article

Download PDF

You have full access to this open access article

Journal of Dynamical and Control Systems Aims and scope Submit manuscript

Control on the Manifolds of Mappings with a View to the Deep Learning

Download PDF

2274 Accesses
8 Citations
1 Altmetric
Explore all metrics

Abstract

Deep learning of the artificial neural networks (ANN) can be treated as a particular class of interpolation problems. The goal is to find a neural network whose input-output map approximates well the desired map on a finite or an infinite training set. Our idea consists of taking as an approximant the input-output map, which arises from a nonlinear continuous-time control system. In the limit such control system can be seen as a network with a continuum of layers, each one labelled by the time variable. The values of the controls at each instant of time are the parameters of the layer.

A survey of the recent architectures of deep convolutional neural networks

Article 21 April 2020

Identity Mappings in Deep Residual Networks

Deep learning: systematic review, models, challenges, and research directions

Article Open access 07 September 2023

1 Introduction and Problem Setting

The name deep learning stands for a set of the methods and the tools which study the problems of classification such as image recognition and speech recognition. These methods involve multilayered artificial neural networks (ANN) and one of the key moments is the training of the networks on a set of classified objects. For a simple mathematical model of the multilayered ANN and of the process of its training we refer to [10].

The functioning of the ANN results from a composition of the actions of separate neurons. Each neuron realizes an activation function $\sigma : \mathbb {R} \rightarrow \mathbb {R}$ with parameters. There are plenty of choices for the activation function, that is normally nonlinear monotone sigmoid-like function.

The vector functions can be assembled from the scalar activation functions:

$$ \bar \sigma : \mathbb{R}^{m} \rightarrow \mathbb{R}^{m} : \ \bar \sigma (x_{1}, {\ldots} , x_{m})=(\sigma(x_{1}), {\ldots} , \sigma(x_{m})). $$

(1)

One can assemble neurons in a multi-layer network in such a way that the outputs of the neurons from a previous layer serve as the inputs for the successive level.

One can introduce parameters into the activation functions via a substitution of their variables. For example a linear change of the argument in (1) results in $\bar \sigma (Kx+B)$, where $x \in \mathbb {R}^{m}, \ K \in \mathbb {R}^{m \times m}, \ B \in \mathbb {R}^{m}$.

The output of an ANN realizes the composition of the functions, each one of which corresponds to a layer:

$$ \begin{array}{@{}rcl@{}} F(x)&=& \end{array} $$

(2)

$$ \begin{array}{@{}rcl@{}} &=&\bar \sigma \left( K^{[M]}\bar \sigma\left( K^{[M-1]}\left( {\ldots} \bar\sigma(K^{[1]}x+B^{[1]}) {\ldots} \right)+B^{[M-1]}\right) +B^{[M]}\right). \end{array} $$

To set the classification problem we consider a finite set of objects, which are represented by the vectors $x^{i} \in \mathbb {R}^{d}, \ i \in \mathcal {I}$. Let $X=\{x^{i}| i \in \mathcal {I}\}$. There is a $\mathbb {R}^{s}$-valued classifying function $c: \mathbb {R}^{d} \mapsto \mathbb {R}^{s}$, defined on X, which attributes to each object xⁱ its “class” $c(x^{i}) \in \mathbb {R}^{s}$.

The objective of the training of an ANN amounts to the adjustment of the values of the parameters K^[1], … , K^[M], B^[1], … , B^[M] in order to achieve the best approximation of the classifying function c(x) by the output map (2). More specifically one seeks to minimize the value of the loss function, which measures the discrepancy between the input-output map of the system and the classification function. For example the least square loss function has form

$$ C\left( K^{[M]}, {\ldots} , K^{[1]}, B^{[M]}, {\ldots} , B^{[1]}\right)=\sum\limits_{i=1}^{N} \left\|c(x^{i})-F(x^{i}) \right\|^{2}_{2} \rightarrow \underset{K^{[j]}, B^{[j]}}{\min}. $$

(3)

Minimization of (3) results in a problem of nonlinear programming, which even for a “medium” number of layers can turn rather complex for classical approaches.

In this contribution we base on a continuous-time dynamic or residual network model for deep learning, with a continuum of layers, labelled by the time variable. The parameters involved at each layer are the values of the controls at the respective instant of time. The analogue of the composition (2) is the end-point of the trajectory or the output of the continuous-time control system in their dependence on control.

As in the model, we referred to above, in the control-theoretic setting one seeks for the values of the parameters (the controls), which provide the best approximation of the classifying function by the output of the control system. Precise formulations and the description of the model can be found in Section 3.

The setting allows for the application of analytic methods of dynamic optimization such as dynamic programming, Bellman’s optimality principle and Pontryagin’s maximum principle together with the corresponding numerical algorithms. This approach to the deep learning has been initiated in the last years by a number of scholars; see for example [8, 11, 12] and references therein. The readers must be warned that we consider a very restricted issue of possible application of the methods of ensemble controllability and ensemble optimal control to the problems of deep learning. Therefore, we only cite the references related to this concrete topic, leaving aside not only a huge amount of literature on deep learning, but also on application of the methods of deep learning to the problems of optimal control.

In the contribution we concentrate on finding the classes of control systems, which are able to guarantee approximation of the classifying functions at each rate. It amounts to studying the problems of ensemble controllability of the control systems and the action of the flows, generated by the control systems, on the manifold of mappings. We formulate the sufficient criteria (Theorem 5.1, Corollary 5.2) of ensemble controllability and provide examples of the nonlinear control systems which demonstrate approximate controllability property in the group of diffeomorphisms of $\mathbb {R}^{n}$ (Theorem 6.3), of a torus $\mathbb {T}^{n}$ (Theorem 6.6) and of the 2-dimensional sphere $\mathbb {S}$ (Theorems 6.7 and 6.10).

2 Neural Networks Modelled by Control Systems

It is an easy task to reformulate optimization problem (2)–(3) as an optimal control problem for a discrete-time controlled dynamic system. If one sets the variables z₁, z₂, … , z_M, which satisfy the relations

$$ z_{1}=x, \ z_{j+1}=\bar \sigma \left( K^{[j+1]}z_{j}+B^{[j+1]}\right), \ j=1, {\ldots} , M-1, $$

(4)

then the map, defined by (2), coincides with the “end-point map”

$$ F(x,K^{[1]}, {\ldots} K^{[M]}, B^{[1]}, {\ldots} ,B^{[M]})=z_{M}. $$

(5)

Alternatively one can introduce the intermediate variables y_j and define the dynamics

$$ z_{1}=x, \ y_{j+1}= K^{[j+1]}z_{j}+B^{[j+1]}, \ z_{j+1}=\bar \sigma \left( y_{j+1}\right), \ j=1, {\ldots} , M-1, $$

(6)

getting again formula (5) for the map (2).

Denote ${z_{j}^{i}}$ (respectively ${y_{j}^{i}},{z_{j}^{i}}$) the points of the trajectories of (4) (respectively (6)), which start with the initial data ${z^{i}_{1}}=x^{i} \in X, \ i \in \mathcal {I}$. Then, the problem of the best least square approximation (3) takes the form

$$ \hat{C}\left( K^{[2]}, {\ldots} , K^{[M]}, B^{[2]}, {\ldots} , B^{[M]}\right)=\sum\limits_{i\in \mathcal{I}} \left\|c(x^{i})-{z^{i}_{M}} \right\|^{2}_{2} \rightarrow \min. $$

(7)

Problems (4), (7), respectively (6), (7) are Mayer problems of optimal control for the control systems with discrete time with free end-point. There are quite few numerical algorithms developed for this class of problems, but we do not treat them in this contribution, making emphasis instead on the continuous-time control systems. ^{Footnote 1}

The way to the representation of the input-output map (5)–(6) as an output of a continuous-time control system is rather straightforward. Let us consider a system, which for the sake of the computational simplicity we choose control-linear:

$$ \dot{z}=f^{0}(z)u_{0}(t)+\sum\limits_{i=1}^{r}f^{i}(z)u_{i}(t), z \in \mathbb{R}^{m}. $$

(8)

For the purpose of our illustration we choose smooth vector field f⁰(z) to be nonlinear, and the vector fields f¹(z), … , f^r(z), to form a basis of the space of the affine vector fields in $\mathbb {R}^{m}$.

Require the diffeomorphism $e^{f^{0}(z)}$ to coincide with $\bar \sigma (z)$, so that the map $\bar \sigma (z)$ is generated by control system (8), driven by the constant control u(t) = (1,0, … ,0) on a unit time interval. Each affine diffeomorphism Kx + B, with $\det K >0$, can be represented as a composition of the diffeomorphisms e^a(z), where a(z) are affine vector fields in $\mathbb {R}^{m}$. Hence, such diffeomorphisms are generated by the control system (8), driven by the piecewise-constant controls.

Therefore the composition (2) or, the same the output map (5) of the discrete-time system (6) can be represented as the end-point map of the continuous-time system (8), driven by a piecewise-constant control.

3 Ensemble Optimal Control Model for the Training of Control-Theoretic ANN

3.1 Ensemble Optimal Control Model

We consider a training set $X=\{x^{1}, {\ldots } , x^{N}\} \subset {\mathscr{M}}$, consisting of N points of a connected Riemannian manifold ${\mathscr{M}}$. In what follows ${\mathscr{M}}$ will be a submanifold of R^d.

We set an optimal control model for the training process of an ANN, which involves a control system in $\mathbb {R}^{d}$, which

$$ \dot{x}=\sum\limits_{i=1}^{r}f^{i}(y)u_{i}(t), \ y \in \mathcal{M}. $$

(9)

We introduce the terminology of the ensembles of points. A finite ensemble of points of a smooth manifold ${\mathscr{M}}$ is an N-ple $\gamma =(x^{1}, {\ldots } , x^{N}) \in {\mathscr{M}}^{N}$, whose components $x^{j} \in {\mathscr{M}}$ are pairwise distinct: i≠j ⇒ xⁱ≠x^j. Thus if ${\Delta }^{N} \subset {\mathscr{M}}^{N}$ stands for the set of N-ples $(x^{1}, {\ldots } , x^{N}) \in {\mathscr{M}}^{N}$ with (at least) two coinciding components, then the space $\mathcal {E}_{N}({\mathscr{M}})$ of the ensembles of N points of ${\mathscr{M}}$ is the complement of ${\Delta }^{N}: \ \mathcal {E}_{N}({\mathscr{M}})={\mathscr{M}}^{N} \setminus {\Delta }^{N}={\mathscr{M}}^{(N)}$. Note that whenever $\dim {\mathscr{M}} >1, \ {\mathscr{M}}^{(N)}$ is an open connected subset and a submanifold of ${\mathscr{M}}^{N}$.

Introduce a classifying map $c: X \to \mathcal {C}$, where $\mathcal {C}$ is a connected Riemannian manifold.

Our goal is to approximate the map c by an action of the flow P_t, generated by the control system (9) which is driven by a control u(t) = (u₁(t), … , u_r(t)). The flow P_t acts on an ensemble (x¹, … , x^N) as

$$P_{t}(x^{1}, {\ldots} , x^{N})=(z_{1}(t), {\ldots} , z_{k}(t))$$

where z_k(t) are the points of the trajectories of the Cauchy problems

$$ \begin{array}{@{}rcl@{}} \dot{z}_{k}=\sum\limits_{i=1}^{r}f^{i}(z_{k})u_{i}(t), \ k=1, {\ldots} , N, \end{array} $$

(10)

$$ \begin{array}{@{}rcl@{}} z_{k}(0)=x^{k}, \ k=1, {\ldots} , N. \end{array} $$

(11)

We introduce an output map

$$ p: \mathcal{M} \to \mathcal{C} ,$$

which is a submersion in the cases we consider.

We fix T > 0 and seek to minimize

$$ \frac{1}{2}\sum\limits_{k=1}^{N}\left\|p(z_{k}(T))-c(x^{k})\right\|^{2} $$

(12)

under constraints (10)–(11).

The infimum of (12) is either positive or null. The distinction is related to the presence or the lack of controllability of system (9) in the space of finite ensembles of points. The problems of controllability have been addressed in [4], where we arranged examples of the systems, which are controllable in the space of finite ensembles of points. We proved that for arbitrary N generic r-ples of vector fields $f^{1}(z), {\ldots } , f^{r}(z) \in \text {Vect}\left (R^{d}\right )$ manifest this property.

Note that even for ensemble controllable systems, the greater is N, more complex are the controls u₁(t), … , u_r(t), which are needed to achieve controllability.

For this reason we opt for a tradeoff between the rate or quality of the approximation (minimization of (12)) and the complexity of the needed control, introducing the loss functional $\mathcal {J}$

$$ \mathcal{J}= \frac{1}{2}\sum\limits_{k=1}^{N}\left\|p(z_{k}(T))-c(x^{k})\right\|^{2} +\frac{\beta}{2}{{\int}_{0}^{T}}\left( {\sum}_{i=1}^{r}|u_{i}(t)|^{2}\right)dt \rightarrow \min . $$

(13)

Problem (10)–(11)–(13) is Bolza optimal control problem with free end-point. In what regards study of the optimal control problem we limit ourselves to the formulation (in the following subsection) of the first-order optimality condition for the problem. In the rest of the contribution we concentrate on the problems of ensemble controllability.

3.2 Equations of Pontryagin Maximum Principle for Ensemble Optimal Control Problem (10)–(11)–(13)

Let z = (z₁, … , z_N), ψ = (ψ₁, … , ψ_N), u = (u₁, … , u_r), where $z_{k}\in \mathbb {R}^{d}$, $\psi _{j}k\in \mathbb {R}^{d^{*}}, \ k=1, {\ldots } , N$, $u \in \mathbb {R}^{r}$.

We introduce the pre-Hamiltonian for (10), (13)

$$ H(z,\psi,u)=\sum\limits_{i=1}^{r} F_{i}(z,\psi)u_{i}- \frac{\beta}{2}\left( \sum\limits_{i=1}^{r} {u_{i}^{2}}\right), $$

(14)

where

$$F_{i}(x,\psi)=\sum\limits_{k=1}^{N} \psi_{k} f_{i}(z_{k}), \ i=1, {\ldots} , r.$$

The adjoint equations of the corresponding pre-Hamiltonian system are

$$ \dot{\psi}_{k}=-\frac{\partial H}{\partial z_{k}}=-\psi_{k}\sum\limits_{i=1}^{r} \frac{\partial f_{i}}{\partial z}(z_{k})u_{i}(t), \ k=1, {\ldots} , N. $$

(15)

The end-point conditions for the adjoint variables are

$$ \psi_{k}(T)=-(p(z_{k}(T))-c(x^{k}))^{*}\frac{\partial p}{\partial z}(z_{k}(T)), \ k=1, {\ldots} , N. $$

(16)

According to the Pontryagin’s Maximum Principle if $\tilde u(t), \tilde z(t)$ are the optimal control and the corresponding optimal trajectory of the problem, then there must exist β ≥ 0 and an adjoint covector $\tilde \psi (t)$, which satisfy the (15) and (16) and such that

$$H(\tilde z(t),\tilde \psi(t), \tilde u(t))=\underset{u}{\max} H(\tilde z(t),\tilde \psi(t), u). $$

By the maximality condition we get $\frac {\partial H}{\partial u_{i}}|_{(\tilde u(t), \tilde z(t))}=0, \ i=1, {\ldots } r$, which in the normal (β > 0) case implies:

$$ u_{i}=\beta^{-1}F_{i}(z,\psi), \ i=1, {\ldots} ,r. $$

(17)

Substituting expressions (17) into pre-Hamiltonian (14) we obtain the maximized (with respect to u) Hamiltonian

$$ M(z,\psi)=\frac{\beta^{-1}}{2}\sum\limits_{i=1}^{r}\left( F_{i}(z,\psi)\right)^{2}. $$

As we already said minimization of the Bolza functional (13) on the set of trajectories (10)–(11) is an interesting example of a sequence of sub-Riemannian Bolza problems depending on the cardinality N of the ensemble (the training set). The complexity of optimal controls for these problems is known to grow with the cardinality. It is rather unrealistic to expect a possibility to integrate Hamiltonian equations of the Maximum Principle (10)–(11)–(14)–(15)–(16)–(17) for a nonlinear control system (10), which is able to solve the interpolation or ensemble controllability problem for each N. In such cases the use of the direct numeric methods for the solution of the variational problem (10)–(11)–(13) looks more realistic, and also for the numeric methods Hamiltonian (14), (17) plays important role. We plan to advance in this direction in our future research. In what follows we study interpolation, or ensemble controllability problems, or more generally approximate controllability in the spaces of mappings.

4 Finite Ensemble Controllability via Lie Algebraic Methods

We approach ensemble controllability from the viewpoint of geometric control theory, in the spirit of what has been done in our previous publication [4]. See also preprint [12] where the Lie algebraic methods are applied to a different class of systems in the context of deep learning.

We start with basic definitions.

Definition 4.1 (finite ensemble controllability)

System (9) has the property of finite ensemble controllability if for each N = 1, 2, … , for each T > 0 and for any two N-ples $x_{\alpha }=(x_{\alpha }^{1}, {\ldots } , x_{\alpha }^{N})$, $x_{\omega }= (x_{\omega }^{1}, {\ldots } , x_{\omega }^{N}) \in {\mathscr{M}}^{(N)}$ there exists a control u(t) = (u₁(t), … , u_r(t)) which steers the corresponding system (10) from x_α to x_ω in time T.

Remark 4.1

If system (10) can steer the point x_α to x_ω in time T > 0 by means of a control u(t), t ∈ [0, T], then it can do the same in any time T^′ > 0 by means of the control $\frac {T}{T^{\prime }} u\left (\frac {T}{T^{\prime }}t\right ), \ t \in [0,T^{\prime }]$.

For a smooth vector field $X \in \text {Vect} {\mathscr{M}}$ consider its N-fold - the vector field on ${\mathscr{M}}^{(N)}$, defined as X^N(x¹, … , x^N) = (X(x¹), … , X(x^N)). System (10) can be given form $\dot \gamma =X^{N}(\gamma ), \ \gamma =(x^{1}, {\ldots } , x^{N}) \in {\mathscr{M}}^{(N)}$.

For $X,Y \in \text {Vect } {\mathscr{M}}$, and N ≥ 1 we define the Lie bracket of the N-folds X^N, Y^N on ${\mathscr{M}}^{(N)}$ “componentwise”: [X^N, Y^N] = [X, Y ]^N - the N-fold of the Lie bracket [X, Y ] of X, Y on ${\mathscr{M}}$. The same holds for the iterated Lie brackets.

We denote Lie{f₁, … , f_r} the Lie algebra generated by the vector fields f₁, … , f_r, and $\text {Lie}\{{f^{N}_{1}}, {\ldots } , {f^{N}_{r}}\}$ the Lie algebra generated by their N-folds.

For the vector fields f₁,…f_r on ${\mathscr{M}}$, their N-folds ${f_{1}^{N}}, {\ldots } , {f_{r}^{N}}$ are called bracket generating on ${\mathscr{M}}^{(N)}$, if the evaluations of the iterated Lie brackets of ${f_{1}^{N}}, {\ldots } , {f_{r}^{N}}$ at each $\gamma \in {\mathscr{M}}^{(N)}$, span the tangent space $T_{\gamma } {\mathscr{M}}^{(N)}=\bigotimes _{j=1}^{N}T_{x_{j}}{\mathscr{M}}$. Evidently for N > 1 the bracket generating property for ${f_{1}^{N}}, {\ldots } , {f_{r}^{N}}$ on ${\mathscr{M}}^{(N)}$ is strictly stronger, than the same property for f₁, … , f_r on ${\mathscr{M}}$.

Rashevsky-Chow theorem [2] implies

Proposition 4.2

If $\dim {\mathscr{M}} >1$ and ∀N ≥ 1 the N-folds ${f_{1}^{N}}, {\ldots } , {f_{s}^{N}}$ are bracket generating on ${\mathscr{M}}^{(N)}$, then system (9) has the property of finite ensemble controllability on ${\mathscr{M}}$.

In [4] we proved that the latter property holds for each N and a generic r-ple f₁, … , f_r of vector fields. In the present context it is more convenient to check a stronger property, which implies the bracket generating property for any N.

Let us introduce the standard notation for the seminorms in the space of smooth vector fields on a manifold ${\mathscr{M}}$: for a compact $K \subset {\mathscr{M}}$ and r ≥ 0

$$ \|X\|_{r,K} =\sup_{x \in K}\left( \sum\limits_{0 \leq |\beta| \leq r }\left|D^{\beta} X(x)\right|\right), \ \|X\|_{r} =\sup_{x \in \mathcal{M}}\left( \sum\limits_{0 \leq |\beta | \leq r }\left|D^{\beta} X(x)\right|\right). $$

In the formulations of controllability results we invoke the following assumptions for the vector fields $f_{1}, {\ldots } , f_{r} \in \text {Vect}({\mathscr{M}})$, which define control system (9).

Assumption 1 (boundedness)

The vector fields f_j(x), j = 1, … , r, are $C^{\infty }$-smooth and bounded on ${\mathscr{M}}$ together with their covariant derivatives of each order.

Assumption 2 (Lie algebra approximating property)

A system of smooth vector fields $f_{1}, {\ldots } , f_{r} \in \text {Vect}({\mathscr{M}})$ demonstrates the Lie algebra approximating property, if ∃m ≥ 1 such that for each C^m-smooth vector field $Y \in \text {Vect}({\mathscr{M}})$ and each compact $K \subset {\mathscr{M}}$ there holds:

$$ \inf\left\{\left\|Y-X\right\|_{0,K}| \ X \in \text{Lie}\{ f_{1}, {\ldots} , f_{r}\} \right\}=0.$$

We show that this property suffices to guarantee finite ensemble controllability.

Theorem 4.3 (Lie algebra approximating property and finite ensemble controllability)

If $\dim {\mathscr{M}} >1$ and the vector fields f₁, … , f_r meet Assumptions 1 and 2, then ∀N ≥ 1 system (10) is controllable in the space $\mathcal {E}_{N}({\mathscr{M}})$ of ensembles of N points.

Proof

Fix N. Choose an ensemble $\gamma =(x^{1}, {\ldots } , x^{N}) \in {\mathscr{M}}^{(N)}$. We prove that the N-folds ${f_{1}^{N}}, {\ldots } , {f_{r}^{N}}$ are bracket generating at γ.

Pick m for which the Lie algebra approximating property holds. Consider the space $\text {Vect}^{m}({\mathscr{M}})$ of C^m-smooth vector fields on ${\mathscr{M}}$ and define for each $\gamma \in {\mathscr{M}}^{(N)}$ the evaluation map $E_{\gamma } :\text {Vect}^{m}({\mathscr{M}}) \mapsto T_{\gamma } {\mathscr{M}}^{(N)} $:

$$E_{\gamma} (Y)=Y^{N}(\gamma)=\left( Y(x^{1}), {\ldots} , Y(x^{N})\right). $$

This linear map is obviously surjective and continuous with respect to C⁰-metric in $\text {Vect}^{m}({\mathscr{M}})$. By virtue of Assumption 2 the image $E_{\gamma }\left (\text {Lie}\{f_{1}, {\ldots } , f_{r}\}\right )$ is a dense linear subspace of $T_{\gamma } {\mathscr{M}}^{(N)}$ and hence must coincide with it. □

Remark 4.2

Below we provide formulations for specific cases in which $\dim {\mathscr{M}}=1$.

5 Lie Algebra Strong Approximating Property: Controllability in the Diffeomorphism Groups and the Manifolds of Mappings

In the previous section we dealt with finite ensembles of points. In this section we show that if a stronger approximating property holds for the Lie algebra Lie{f₁, … , f_r}, associated to control system (9), then approximate controllability of system (9) holds in the group $\text {Diff}^{c}_{0}$ of diffeomorphisms on ${\mathscr{M}}$ and on the manifolds of smooth mappings of ${\mathscr{M}}$.

In our proofs we make occasional use of few notations of chronological calculus for the flows generated by the time-dependent vector fields [1]. In particular for a vector field X_t(x), which is smooth in x and locally integrable in t we denote by $\overset {\longrightarrow }{\exp }{\int \limits }_{t_{0}}^{t}X_{s}ds$ the flow P_t, generated by the time-dependent differential equation $\dot x=X_{t}(x), \ P_{t_{0}}=I$. If X_t is time independent: X_t(x) ≡ X(x), then the flow is denoted by $P_{t}=e^{(t-t_{0})X}$. A brief presentation of the chronological calculus can be found in [2].

The following definition has been used in [4]. Put for ℓ > 0 and a compact $K \subset {\mathscr{M}}$:

$$\text{Lie}^{\ell}_{1,K}\{f_{1}, {\ldots} , f_{r}\}=\left\{X(x) \in \text{Lie}\{f_{1}, {\ldots} , f_{r}\}\left| \ \|X\|_{1,K} < \ell \right.\right\}.$$

Assumption 3 (Lie algebra strong approximating property)

A system of smooth vector fields $f_{1}, {\ldots } , f_{r} \in \text {Vect}({\mathscr{M}})$ possesses Lie algebra strong approximating property, if ∃m ≥ 1, such that for each C^m-smooth vector field $Y \in \text {Vect}({\mathscr{M}})$ and each compact $K \subset {\mathscr{M}} \ \exists \ell >0$ for which:

$$ \inf \left\{\left. \sup_{x \in K }\left|Y(x)-X(x) \right| \ \right| \ X \in \text{Lie}^{\ell}_{1,K}\left\{f_{1}, {\ldots} , f_{r}\right\}\right\}=0. $$

(18)

Denote by $\text {Diff}^{c}_{0}$ the connected component of the identity of the group of the compactly supported diffeomorphisms of ${\mathscr{M}}$.

Theorem 5.1 (C ⁰-approximate controllability in the group of diffeomorphisms)

Let $\hat P \in \text {Diff}^{c}_{0}({\mathscr{M}})$. Let $C^{\infty }$-smooth vector fields f_j(x), j = 1, … , r, meet Assumptions 1 and 3. Then for each $K \subset {\mathscr{M}}$ and each ε > 0 there exists a control u(t) = (u₁(t), … , u_r(t)), t ∈ [0, T], such that for the corresponding flow

$$ P_{t}= {\overset{\longrightarrow}{\exp}{\int}_{0}^{t}} \left( \sum\limits_{j=1}^{r} f_{j}(x)u_{j}(\tau)\right)d\tau , \ x \in \mathcal{M} $$

(19)

generated by system (9), the diffeomorphism P_T ε-approximates $\hat P$ in C⁰ on K: $\left \|\hat P-P_{T}\right \|_{0,K} < \varepsilon . $

Proof

Join the identity I with $\hat P$ by a curve $t \mapsto \hat P_{t}(x), \ t \in [0,T]$ in $\text {Diff}^{c}_{0}({\mathscr{M}})$. Without loss of generality we may assume that $(t,x) \mapsto \hat P_{t}(x)$ is C¹-smooth. The curve $t \mapsto \hat P_{t}(x)$ can be represented as a flow $\hat P_{t}={\overset {\longrightarrow }{\exp }{\int \limits }_{0}^{t}} Y_{\tau } d\tau $, generated by a non autonomous vector field Y_t, which is continuous in t; one can take $Y_{t}(x)=(P_{t})^{-1}_{*} \frac {dP_{t}}{dt}(x)$.

Denote by K_t, t ∈ [0, T] the images of a compact set K under the flow $\hat P_{t}$. As far as for each t ∈ [0, T] condition (18 holds for the vector fields Y_t and control system (9), then one can apply Theorem 4.3 of [4] to the vector field Y_t, the diffeotopy K_t, t ∈ [0, T] and system (9). According to this Theorem for each ε > 0 there exists a control u(t) = (u₁(t), … , u_r(t)), t ∈ [0, T] such that for the flow (19)

$$ \underset{x \in K}{\sup}\left\|\hat P(x) -P_{T}(x) \right\|_{0,K} < \varepsilon .$$

□

The approximation result, we have just proved, can be extended from diffeomorphisms of ${\mathscr{M}}$ to a broader class of continuous maps $\varphi :{\mathscr{M}} \to \mathcal {C}$.

One of possible constructions can be realized on the manifold ${\mathscr{M}} \times \mathcal {C}$. Consider the projection $p: {\mathscr{M}} \times \mathcal {C} \to \mathcal {C}$ and a diffeomorphic immersion $\imath : {\mathscr{M}} \to {\mathscr{M}} \times \mathcal {C} $. We opt for $\imath (x)=(x,\nu ), \ \forall x \in {\mathscr{M}}$, where ν is a selected point of $\mathcal {C} $. Let the metric d on ${\mathscr{M}} \times \mathcal {C}$ be defined by $d=d_{{\mathscr{M}}}+d_{\mathcal {C}}$.

Let $\varphi :{\mathscr{M}} \to \mathcal {C}$ be a continuous mapping which is approximately C¹-smoothly homotopic to the constant mapping φ₀(x) = ν. This means that in any C⁰-neighborhood of φ there are C¹-smooth functions $\hat \varphi $, which are contractible to the constant function by C¹-smooth homotopies $\hat \varphi _{t}(x), \ t \in [0,1]$:

$$ \hat \varphi_{0}(x) \equiv \nu, \ \hat \varphi_{1}(x)=\hat \varphi (x).$$

Without loss of generality we can limit ourselves to the case in which $\varphi =\hat \varphi $ is C¹-smooth and C¹-smoothly homotopic to the constant function. Consider the graphs of the mappings $\varphi _{t}(x): \ {\varGamma }_{t} = \{(x,\varphi _{t}(x)), \ x \in {\mathscr{M}}\} \subset {\mathscr{M}}\times \mathcal {C}$. For each t the sets Γ_t are diffeomorphic to Γ₀ and to ${\mathscr{M}}$. The flow $\hat P_{t}$, generated on the manifold ${\mathscr{M}}\times \mathcal {C}$ by the vector field $\frac {\partial \varphi _{t}(x)}{\partial t}\frac {\partial }{\partial c}$, defines the diffeotopy of the graphs:

$${\varGamma}_{t}=\hat P_{t}({\varGamma}_{0}), \ t \in [0,1]; \ P_{1}(x,\nu)=(x, \varphi(x)), \ \forall x \in \mathcal{M}.$$

Let control system (9), defined now on ${\mathscr{M}} \times \mathcal {C}$, possess the Lie algebra strong approximating property. By the previous theorem for each compact $K \subset {\mathscr{M}}$ and each ε > 0 there exists a control u(⋅) = (u₁(⋅), … , u_r(⋅)), such that for the flow

$$ P_{t}= {\overset{\longrightarrow}{\exp}{\int}_{0}^{t}} \left( \sum\limits_{j=1}^{r} f_{j}(x)u_{j}(\tau)\right)d\tau, \ x \in \mathcal{M} \times \mathcal{C} $$

(20)

there holds $\|P_{1}- \hat P_{1}\|_{0,K \times \{\nu \}}<\varepsilon $. Then

$$\forall x \in K: \ \varepsilon >d_{\mathcal{M}}(p \circ P_{1}(x,\nu), p \circ \hat P_{1}(x,\nu))=d_{\mathcal{M}}(p \circ P_{1}(x,\nu),\varphi(x)) $$

and we conclude with the corollary.

Corollary 5.2

Let control system (9), defined on ${\mathscr{M}} \times \mathcal {C}$, meet Assumptions 1 and 3. Then the system is C⁰-approximately controllable on the manifold of mappings: for each continuous mapping $\varphi :{\mathscr{M}} \to \mathcal {C}$, which is approximately smoothly homotopic to a constant, each ε > 0 and each compact $K \subset {\mathscr{M}}$ there exists u(t) = (u₁(t), … , u_r(t)), t ∈ [0, T], such that for the corresponding flow (20) on ${\mathscr{M}} \times \mathcal {C}$ there holds $\left \|\varphi (x) -p \circ P_{T} \circ \imath (x) \right \|_{0,K} < \varepsilon . $

Recent publication [6] contains an interesting example of five polynomial vector fields X¹, … , X⁵ in $\mathbb {R}^{n}$, which generate the Lie algebra of all the polynomial vector fields in $\mathbb {R}^{n}$ and as a consequence guarantee in the terminology of [6] the “universal interpolation property”. As far as polynomial vector fields C^m -approximate each C^m-smooth vector field X¹, … , X⁵ possess Lie algebra strong approximating property, i.e meet Assumption 3 (and hence Assumption 2). Then by Theorem 4.3, the control system $ \dot x=X^{1}u_{1}(t)+ {\cdots } +X^{5}u_{5} (t)$ possesses the property of ensemble controllability for finite ensembles of points (equivalent to the “universal interpolation property”), and moreover by Theorem 5.1 possesses the property of C⁰ approximate controllability in the group of diffeomorphisms.

An interesting algebraic question consists of finding a minimal collection of the polynomial vector fields in $\mathbb {R}^{n}$, which would meet Assumption 3 or more particularly generate a Lie algebra of all the polynomial vector fields. From our treatment of the case of a 2D sphere, accomplished in the next section, one gets an idea that the representation theory for GL(n) can be instrumental for a construction of such minimal collection.

6 Ensemble Controllable Systems on Euclidean Spaces $\mathbb {R}^{d}$, tori $\mathbb {T}^{d}$ and the 2-dimensional Sphere $\mathbb {S}$

In this section we consider several manifolds, such as Euclidean spaces $\mathbb {R}^{d}$, d-dimensional tori $\mathbb {T}^{d}$ and 2-dimensional sphere $\mathbb {S}$. We provide examples of control systems on the manifolds, which possess controllability properties for finite ensembles and properties of approximate controllability in the group of diffeomorphisms of the manifolds.

For the sake of brevity along the Section we will call system ensemble controllable if the conclusions of Theorems 4.3 and 5.1 hold for it.

The key point of the proofs is the verification of the Lie algebra strong approximating condition. Such a verification regards two moments. First we have to establish kind of “Lie rank condition” — the approximability of the vector fields by the vector fields from Lie{f₁, … , f_r}. The second issue is the regularity of these approximations, including boundedness of the derivatives of the approximants.

6.1 Ensemble Controllable System in ${\mathbb R}^{d}$

Consider control-linear system in $\mathbb {R}^{d}$:

$$ \dot{z}=\sum\limits_{i=1}^{d}f_{i}(z)u_{i} + \sum\limits_{i=1}^{d}g_{i}(z)v_{i}, \ z \in \mathbb{R}^{d} , $$

(21)

where

$$ f_{i}(z)=e^{-\gamma(z)}\frac{\partial}{\partial z_{i}}, \ g_{i}=\frac{\partial}{\partial z_{i}}, \ i=1, {\ldots} , d, $$

(22)

and

$$ \gamma(z)=\frac{\langle z, z \rangle}{2}=\frac{{z_{1}^{2}}+{\cdots} +{z_{n}^{2}}}{2}.$$

Putting z = (z₁, … , z_d), u = (u₁, … , u_d), v = (v₁, … , v_d) we represent (21)–(22) in a vectorial form

$$ \dot{z}=e^{-\gamma(z)}u+v, \ z,u,v \in \mathbb{R}^{d}. $$

(23)

We call it GH system as far as Gaussian density function e^−γ(z) and Hermite polynomials play important role in its study.

We consider the action of system (23) onto an ensemble of points $(x^{1}, {\ldots } , x^{N}) \in \left (\mathbb {R}^{d}\right )^{N}$. To establish the property of ensemble controllability we verify the Lie algebra strong approximation condition for GH system.

Proposition 6.1

Vector fields (22) meet Assumptions 1 and 3.

Proof

Direct computation of the iterated Lie brackets of vector fields (22) gives

$$\text{ad}^{m_{j1}}_{g_{1}} {\cdots} \text{ad}^{m_{jd}}_{g_{d}} f_{j}(z) = \frac{\partial^{m_{j}}e^{-\gamma(z)}}{\partial z_{1}^{m_{j1}}{\ldots} \partial z_{d}^{m_{jd}}} \frac{\partial}{\partial z_{j}}, \ m_{j}=m_{j1}+ {\cdots} + m_{jd} . $$

As one knows

$$ \frac{\partial^{m_{j}}e^{-\gamma(z)}}{\partial z_{1}^{m_{j1}}{\ldots} \partial z_{d}^{m_{jd}}}=(-1)^{m_{j}}H_{m_{j1}, {\ldots} , m_{jd}}(z)e^{-\gamma(z)}, \ z=(z_{1}, {\ldots} , z_{d}), $$

(24)

where $H_{m_{j1}, {\ldots } , m_{jd}}(z_{1}, {\ldots } , z_{d})$ are multivariate Hermite polynomials. Thus for each j = 1, … , d and each Hermite polynomial $H_{m_{j1}, {\ldots } , m_{jd}}(z)$ the vector field $H_{m_{j1}, {\ldots } , m_{jd}}(z)e^{-\gamma (z)}\frac {\partial }{\partial z_{j}}$ belongs to the Lie algebra generated by vector fields (22).

Hermite polynomials $\{H_{m_{1},{\ldots } ,m_{d}}(z_{1}, {\ldots } , z_{d})| \ m_{1} \geq 0 , {\ldots } , m_{d} \geq 0\}$ form a complete orthogonal system in $L_{2}(\mathbb {R}^{d})$ with respect to the weighted scalar product

$$ \langle f , g \rangle= \frac{1}{(2\pi)^{d/2}} {\int}_{\mathbb{R}^{d}}f(z)g(z)e^{-\gamma (z)}dx . $$

Any function from $L_{2}(\mathbb {R}^{d})$ can be expanded into a L₂-convergent series in Hermite polynomials. To verify the Lie algebra strong approximating condition one has to prove that for each sufficiently smooth vector field $Y(X)={\sum }_{j=1}^{d}Y_{j}(z)\frac {\partial }{\partial z_{j}}$ with compact support in ${\mathbb R}^{d}$, there exists ℓ > 0 such that for each j = 1, … , d and each ε > 0 one can find a linear combination X_j of the functions (24) for which

$$ \|X_{j}\|_{1,K} \leq \ell , \ \|X_{j} - Y_{j}\|_{0,K} \leq \varepsilon .$$

Suppose Y (x) to be $C^{[\frac {d}{2}]+2}$-smooth. Pick its component Y_j(x) and consider the orthogonal expansion of the function Y_j(z)e^γ(z) in Hermite polynomials:

$$ Y_{j}(z) e^{\gamma (z)} \sim \sum\limits_{m} c_{m} H_{m}(z), \ m=(m_{1}, {\ldots} , m_{d}) \in \mathbb{N}^{d}. $$

(25)

For |m| = m₁ + ⋯ + m_d let $S_{n}(z)={\sum }_{m: \ |m| \leq n} c_{m} H_{m}$ be a partial sum of this expansion. Lie algebra strong approximating condition is implied by the following Lemma.

Lemma 6.2

For Y_j(z) being $C^{\left [\frac {d}{2}\right ]+2}$-smooth the functions S_n(z)e^−γ(z) converge uniformly to Y_j(z), as $n \to \infty $, while $\frac {\partial }{\partial z_{i}}\left (S_{n}(z)e^{-\gamma (z)}\right )$ converge uniformly to $\frac {\partial Y_{j}(z)}{\partial z_{i}}$ and hence are bounded by a constant ℓ independent of n.

Proof of the lemma can be found in the A. □

By virtue of Theorems 4.3 and 5.1 and Proposition 6.1 there holds

Theorem 6.3 (ensemble controllability of GH system)

i) For d > 1 system (23) is ensemble controllable on ${\mathscr{M}}=\mathbb {R}^{d}$;

ii) For ${\mathscr{M}}=\mathbb {R}$ system (23) is approximately controllable in the group of diffeomorphisms $\text {Diff}^{c}_{0}(\mathbb {R})$;

iii) For ${\mathscr{M}}=\mathbb {R}$ system (23) can transform a finite ensemble $\left (x^{1}_{\alpha } , {\cdots } , x^{N}_{\alpha }\right )$ into another ensemble $\left (x^{1}_{\omega } , {\cdots } , x^{N}_{\omega }\right )$ if and only if they are equally ordered: $x^{i}_{\alpha } < x^{j}_{\alpha } \Leftrightarrow x^{i}_{\omega } < x^{j}_{\omega } ,\ \forall i,j$.

6.2 Ensemble Controllability on the Tori $\mathbb {T}^{d}$

We start with d = 1. Consider the control-linear system on ${\mathbb T}^{1}$:

$$ \dot{\varphi}=u_{0}+u_{1} \sin \varphi + u_{2} \sin 2\varphi , $$

(26)

generated by the vector fields

$$ f_{0}(\varphi)= \frac{ \partial}{ \partial \varphi}, \ f_{1}(\varphi)=\sin \varphi \frac{ \partial}{ \partial \varphi}, \ f_{2}(\varphi)=\sin 2\varphi \frac{ \partial}{ \partial \varphi}. $$

(27)

Here φ is the angle coordinate on ${\mathbb T}^{1}$.

The action of system (26) on an ensemble of N points $(\varphi ^{1}_{\alpha }, {\ldots } , \varphi ^{N}_{\alpha })$ on ${\mathbb T}^{1}$ is defined by the equations

$$ \begin{array}{@{}rcl@{}} \dot{\varphi}^{j}=u_{0}(t)+u_{1}(t) \sin \varphi^{j} + u_{2}(t) \sin 2\varphi^{j}, \ j=1, {\ldots} , N, \\ \varphi^{j}(0)=\varphi^{j}_{\alpha} \end{array} $$

Lemma 6.4

Vector fields (27) meet Assumptions 1 and 3.

Proof

Boundedness is obvious.

We prove that the Lie algebra Lie{f₀, f₁, f₂}, generated by vector fields (27), contains the vector fields ${\sin \limits } k\varphi \frac {\partial }{\partial \varphi }, \ {\cos \limits } k\varphi \frac {\partial }{\partial \varphi }, k=1, 2 , {\ldots } $.

As far as $\left [ \frac {\partial }{\partial \varphi }, {\sin \limits } k\varphi \frac {\partial }{\partial \varphi } \right ]=k {\cos \limits } k\varphi \frac {\partial }{\partial \varphi }$, it suffices to prove that ${\sin \limits } k\varphi \frac {\partial }{\partial \varphi }$, k ≥ 1 are contained in Lie{f₀, f₁, f₂}. This can be done by induction in k, given that for k > 1

$$\left[ \sin \varphi \frac{\partial}{\partial \varphi}, \sin k\varphi \frac{\partial}{\partial \varphi} \right]=(k-1) \sin ((k+1)\varphi)- (k+1) \sin ((k-1)\varphi) . $$

Consider a vector field $Y(\varphi )\frac {\partial }{\partial \varphi }$ on $\mathbb {T}^{1}$ together with its Fourier expansion

$$Y(\varphi)\frac{\partial}{\partial \varphi} \sim \frac{a_{0}}{2}\frac{\partial}{\partial \varphi} + \sum\limits_{k=1}^{\infty}\left( a_{k} \cos k\varphi \frac{\partial}{\partial \varphi}+b_{k} \sin k\varphi \frac{\partial}{\partial \varphi}\right). $$

By the aforesaid partial sums of the series belong to Lie{f₀, f₁, f₂}. For Y (φ) being C²-smooth the partial sums S_n(φ) of the Fourier series converge uniformly to Y (φ), as $n \to \infty $. The derivatives $S^{\prime }_{n}(\varphi )$ converge uniformly to $Y^{\prime }(\varphi )$ and hence are equibounded, wherefrom the Lie algebra strong approximating condition follows. □

To extend the construction to the d-dimensional torus $\mathbb {T}^{d}=\mathbb {T}^{1} \times {\cdots } \times \mathbb {T}^{1}$ we introduce the coordinates φ₁, … , φ_d in $\mathbb {T}^{d}$ and define the vector fields

$$ \begin{array}{@{}rcl@{}} {f^{0}_{i}}=\frac{\partial}{\partial \varphi_{i}}, \ {f^{1}_{i}}=\sin \varphi_{i} \frac{\partial}{\partial \varphi_{i}}, \ {f^{2}_{i}}= \sin 2\varphi_{i} \frac{\partial}{\partial \varphi_{i}}, \ i=1, {\ldots} , d; \end{array} $$

(28)

$$ \begin{array}{@{}rcl@{}} g_{i}=\left( \sum\limits_{j=1}^{d} \sin \varphi_{j} \right) \frac{\partial}{\partial \varphi_{i}}, \ i=1, {\ldots} , d. \end{array} $$

Consider the control-linear system

$$ \dot{\varphi}_{k}=u_{0k}+\sin \varphi_{k} u_{1k}+\sin 2\varphi_{k} u_{2k} +\left( \sum\limits_{j=1}^{d} \sin \varphi_{j} \right)v_{k}, \ k=1, {\ldots} d. $$

(29)

Lemma 6.5

The Lie algebra ${\mathscr{L}}_{\mathbb {T}^{d}}$ generated by vector fields (28) contains all the monomial vector fields of the form

$$ \left( \prod\limits_{i \in \mathcal{I}} \cos k_{i} \varphi_{i} \prod\limits_{i \in \mathcal{I}^{c}} \sin k_{i} \varphi_{i} \right)\frac{\partial}{\partial \varphi_{j}}, \ j=1, {\ldots} , d $$

(30)

where $\mathcal {I} \cup \mathcal {I}^{c} =\{1, {\ldots } , d\}, \ \mathcal {I} \cap \mathcal {I}^{c}=\emptyset $.

Proof

By the previous lemma the monomial vector fields ${\cos \limits } k_{i}\varphi _{i} \frac {\partial }{\partial \varphi _{i}}$, ${\sin \limits } k_{i}\varphi _{i} \frac {\partial }{\partial \varphi _{i}}, \ i=1, {\ldots } , d,$ belong to the Lie algebra. So do the vector fields

$$ \left[{f^{0}_{j}}, g_{i}\right]= \cos \varphi_{j} \frac{\partial}{\partial \varphi_{i}}, \ \left[{f^{0}_{j}},\left[{f^{0}_{j}}, g_{i}\right]\right]= -\sin \varphi_{j} \frac{\partial}{\partial \varphi_{i}} $$

for i≠j.

If ${\sin \limits } k \varphi _{j} \frac {\partial }{\partial \varphi _{i}}$ for k ≤ l belong to ${\mathscr{L}}_{\mathbb {T}^{d}}$, then

$$ \left[f^{1}_{j}, \sin l \varphi_{j} \frac{\partial}{\partial \varphi_{i}} \right] =\frac{l}{2}\left( \sin ((l+1)\varphi_{j})\frac{\partial}{\partial \varphi_{i}} - \sin((l-1)\varphi_{j})\frac{\partial}{\partial \varphi_{i}} \right) $$

and by induction in l we conclude that all the monomial vector fields ${\cos \limits } l\varphi _{j} \frac {\partial }{\partial \varphi _{i}}$, ${\sin \limits } l\varphi _{j} \frac {\partial }{\partial \varphi _{i}}, \ i,j=1, {\ldots } , d,$ belong to ${\mathscr{L}}_{\mathbb {T}^{d}}$.

We define the degree of a monomial vector field (30) as the cardinality of the set {i ∈{1, … , d}|k_i≠ 0} and proceed by induction in the degree. Each monomial vector field of degree s + 1 is either $M(\varphi ){\cos \limits } k_{\alpha } \varphi _{\alpha } \frac {\partial }{\partial \varphi _{j}} $ or $M(\varphi ){\sin \limits } k_{\alpha } \varphi _{\alpha } \frac {\partial }{\partial \varphi _{j}}$, where M(φ) has degree s and does not depend on φ_α.

In the first case if M(φ) does not depend on φ_j, and hence

$$\left[M(\varphi)\frac{\partial}{\partial \varphi_{\alpha}},\sin k_{\alpha}\varphi_{\alpha}\frac{\partial}{\partial \varphi_{j}}\right]= k_{\alpha} M(\varphi)\cos k_{\alpha} \varphi_{\alpha} \frac{\partial}{\partial \varphi_{j}}.$$

If M(φ) depends on φ_j, then α≠j and one can easily find a monomial M₁(φ) of degree s such that $\frac {\partial }{\partial \varphi _{j}} M_{1}(\varphi )=M(\varphi )$. Then

$$\left[ \cos k_{\alpha} \varphi_{\alpha} \frac{\partial}{\partial \varphi_{j}}, M_{1}(\varphi) \frac{\partial}{\partial \varphi_{j}}\right]=M(\varphi)\cos k_{\alpha} \varphi_{\alpha} \frac{\partial}{\partial \varphi_{j}}.$$

In this way we conclude the step of induction and the proof. □

The Lie algebra strong approximating property for (29) follows from the lemma by classical approximation results for multivariate trigonometric polynomials.

In what regards the formulation of criteria of finite ensemble controllability there is some peculiarity in the case of $\mathbb {T}^{1}$. Note that for a given orientation of $\mathbb {T}^{1}$ any ensemble of N points on $\mathbb {T}^{1}$ is ordered up to cyclic permutation. Two ensembles are equally ordered if the sequences of their indices are the same up to a cyclic permutation.

Theorem 6.6

Control system (26) and (29) have the following ensemble controllability properties:

i) for d > 1 system (29) is ensemble controllable on $\mathbb {T}^{d}$;

ii) for ${\mathscr{M}}=\mathbb {T}^{1}$ system (26) is C⁰-approximately controllable in $\text {Diff}_{0}(\mathbb {T}^{1})$;

iii) two finite ensembles on $\mathbb {T}^{1}$ can be steered one into another by means of control system (26) in time T > 0, if and only if they are equally ordered.

6.3 Ensemble Controllability on the 2-dimensional Sphere

We construct examples of control systems on the 2-dimensional sphere ${\mathbb S} \subset \mathbb {R}^{3}$, which demonstrate the property of ensemble controllability. Both examples are related to the study in [3] of the controllability of the Navier-Stokes equation on ${\mathbb S}$.

We consider a Riemannian structure on ${\mathbb S}$, induced by the Euclidean structure of $\mathbb {R}^{3} \supset {\mathbb S}$. If $f: {\mathbb S} \to \mathbb {R}$ is the restriction onto ${\mathbb S}$ of a smooth function $F:\mathbb {R}^{3} \to \mathbb {R}$, then the spherical gradient

$$\nabla_{\mathbb S} f(x)=\nabla F - \langle \nabla F , x \rangle Ex$$

is the projection of the gradient ∇F onto the tangent bundle $T\mathbb {S}$ to $\mathbb {S}$. Here Ex stands for the Euler vector field in $\mathbb {R}^{3}$: $Ex={\sum }_{i=1}^{3}x_{i}\partial _{i}$.

In general if X is a smooth vector field in $\mathbb {R}^{3}$, then the projection onto $T{\mathbb S}$ of the restriction of X to ${\mathbb S}$ is

$$\text{pr}_{\mathbb S} X(x)=X(x)-\langle X(x),x \rangle E(x), \ x \in {\mathbb S}, $$

which will be a smooth vector field on ${\mathbb S}$.

Consider standard symplectic structure σ_x(⋅,⋅) on ${\mathbb S} \subset \mathbb {R}^{3}$ defined by the area form. For $x \in {\mathbb S}$, $\xi , \eta \in T_{x} {\mathbb S}$ one has σ_x(ξ, η) = 〈x, ξ, η〉, where the latter trilinear form is the mixed product in $\mathbb {R}^{3}$.

We introduce the spherical divergence $\text {div}_{\mathbb S} \text {pr}_{\mathbb S} X(x)$ of $\text {pr}_{\mathbb S} X(x)$ with respect to the area form σ. To this end we consider the interior product of the vector field $\text {pr}_{\mathbb S} X(x)$ with the differential 2-form σ; it is the 1-form defined by

$$ \eta \to \sigma(\text{pr}_{\mathbb S} X(x) , \eta)= \langle x \times \text{pr}_{\mathbb S} X(x), \eta \rangle =\langle x \times X(x), \eta \rangle , $$

(31)

where × stands for the cross product in $\mathbb {R}^{3}$. The exterior derivative of the 1-form is the 2-form ψ(x)σ, whose coefficient ψ(x) coincides with the spherical divergence $\text {div}_{\mathbb S} \text {pr}_{\mathbb S} X(x)$.

To compute the exterior derivative we apply Stokes theorem to the integral of the 1-form (31) along a closed curve on $\mathbb {S}$ and conclude that it equals to the flow of the curl of the vector field x × X(x) through the spherical area circumvented by the curve. Hence

$$ \begin{array}{@{}rcl@{}} \text{div}_{\mathbb S} \text{pr}_{\mathbb S} X(x)=\langle \text{curl}\left( E(x) \times X(x)\right),E(x) \rangle=\langle (\text{div} X)E(x),E(x)\rangle - \\ \text{div} E(x)\langle X(x),E(x)\rangle= \text{div} X -3 \langle X(x),E(x)\rangle . \end{array} $$

In particular $\text {div}_{\mathbb S} \text {pr}_{\mathbb S} X(x)=\text {div } X$, if X is tangent to $\mathbb {S}$.

Once we have defined spherical divergence $\text {div}_{\mathbb S}$ and spherical gradient $\nabla _{\mathbb S}$, then spherical Laplacian of a function f on ${\mathbb S}$ is defined as:

$${\Delta}_{\mathbb S} f= \text{div}_{\mathbb S} \nabla_{\mathbb S} f. $$

Consider the homogeneous harmonic polynomials on $\mathbb {R}^{3} \setminus 0$ and take their restrictions onto ${\mathbb S}$; those are called spherical harmonics. We call them linear, quadratic, cubic, of n th degree etc., if they are restrictions of the homogeneous polynomials of the corresponding degree. Spherical harmonics are the eigenfunctions of the spherical Laplacian.

Restriction of any smooth function φ in $\mathbb {R}^{3}$ onto ${\mathbb S}$ gives rise to the Hamiltonian vector field $\overrightarrow {\varphi }$ on ${\mathbb S}$, which is defined by the relation:

$$(\overrightarrow{\varphi}(x),\eta)=\sigma(\nabla \varphi (x),\eta)=\langle x , \nabla \varphi(x) , \eta \rangle =(x \times \nabla \varphi (x), \eta), \ \eta \in T_{x}{\mathbb S};$$

hence $\overrightarrow {\varphi }(x)=x \times \nabla \varphi (x), \ x \in \mathbb {S}$.

We provide an example of Hamiltonian control system which has the property of approximate controllability in the group of the area-preserving diffeomorphisms on ${\mathbb S}$.

Theorem 6.7

Given three independent linear harmonics (l¹, x), (l², x), (l³, x), a quadratic harmonic q(x), a cubic harmonic c(x) and the corresponding Hamiltonian vector fields

$$ \overrightarrow{l}^{1}(x), \overrightarrow{l}^{2}(x),\overrightarrow{l}^{3}(x), \overrightarrow{q}(x), \overrightarrow{c}(x), $$

(32)

the control system

$$ \dot{x}=\sum\limits_{i=1}^{3}\overrightarrow{l}^{i}(x) u_{i}(t)+ \overrightarrow{q}(x)v_{2}(t)+\overrightarrow{c}(x)v_{3}(t) $$

(33)

is controllable in the space of finite ensembles on $\mathbb {S} $ and approximately controllable in the group $\text {SDiff}_{0}\left ({\mathbb S}\right )$ of the area-preserving diffeomorphisms of $\mathbb {S} $.

Proof

The following statement has been proved in [3, Theorem 10.4].

Proposition 6.8

The Lie algebra generated by the Hamiltonian vector fields (32) contains all the symplectic vector fields $\overrightarrow {h}$, which correspond to harmonic homogeneous polynomials (spherical harmonics) h(x), and therefore is dense in the space of all the divergence-free vector fields.

Spherical harmonics form a complete system in $L_{2}(\mathbb {S})$. To prove the Lie algebra strong approximating condition we consider the expansions of functions on $\mathbb {S} $ in Laplace series with respect to spherical harmonics. We apply a result by M.Ganesh, I.G.Graham & J.Sivaloganathan [9, Theorem 3.5] on the best approximation by Laplace series of smooth functions on the spheres ${\mathbb S}^{m}$ together with their derivatives up to some order.

Lemma 6.9

Let $C({\mathbb S})$ be the space of continuous functions on the sphere and $\mathcal {P}_{n}$ be the space of spherical polynomials of degree ≤ n. For each n ≥ 1 there exist continuous linear operator ${\mathcal T_{n}}: C({\mathbb S}) \mapsto \mathcal {P}_{n}$ and for every l ≥ 0 a constant b_l such that for all $k=0, {\ldots } , l; \ f \in C^{l}({\mathbb S})$

$$\left\|f - {\mathcal T}_{n} f \right\|_{C^{k}} \leq b_{l}\left( \frac{1}{n}\right)^{l-k}\|f\|_{C^{l}}. $$

(This result builds on the previous work by D.L.Ragozin and D.J.Newman & H.S. Shapiro; see references in [9]).

Let Y (x) be a C²-smooth divergence-free (Hamiltonian) vector field on ${\mathbb S}$ and Υ the corresponding C³-smooth Hamiltonian. By Lemma 6.9

$$ \|{\varUpsilon} - {\mathcal T}_{n} {\varUpsilon}\|_{C^{2}} \leq \frac{b_{2}}{n}\|{\varUpsilon}\|_{C^{3}} $$

for some constant b₂ > 0.

This implies that T_nΥ and its first and second derivatives DT_nΥ, D²T_nΥ converge uniformly to Υ, DΥ, D²Υ correspondingly as $n \to \infty $. This means that the Hamiltonian vector fields $\overrightarrow {T_{n} {\varUpsilon }}$ converge uniformly to Y, and their derivatives $D\overrightarrow {T_{n} {\varUpsilon }}$ converge uniformly to DY as $n \to \infty $. Hence the derivatives $D\overrightarrow {T_{n} {\varUpsilon }}$ are equibounded, and the vector fields $\overrightarrow {T_{n} {\varUpsilon }}$ are equilipschitzian. According to Proposition 6.8 the vector fields $\overrightarrow {T_{n} {\varUpsilon }}$ belong to the Lie algebra generated by the vector fields (32) and hence the Lie algebra strong approximating condition holds for control system (33). □

We now pass to finding an example of control system, which is approximately controllable in the group of smooth diffeomorphisms $\text {Diff}_{0}\left ({\mathbb S}\right )$ of ${\mathbb S}$.

By Helmholtz-Hodge theorem each smooth vector field f on ${\mathbb S}$ can be represented as a sum of a gradient vector field $f^{\nabla }=\nabla _{\mathbb S} F$ and an area-preserving (and symplectic in the 2D case) vector field f^⊩. One may think of constructing the desired example, by joining some gradient vector fields to Hamiltonian vector fields (32).

Theorem 6.10

Let $\overrightarrow {l}^{1}(x), \overrightarrow {l}^{2}(x),\overrightarrow {l}^{3}(x), \overrightarrow {q}(x), \overrightarrow {c}(x),$ be the Hamiltonian vector fields (32). Let $\tilde l(x)=(l,x)$, $\tilde {q}(x)$, be a linear and a quadratic spherical harmonics, and $\tilde l^{\prime } (x)=\nabla _{S} (l,x), \tilde {q}^{\prime }(x)=\nabla _{S}\tilde {q}(x)$ be the corresponding gradient vector fields.

The control system on the 2-dimensional sphere ${\mathbb S}$

$$ \dot{x}=\sum\limits_{i=1}^{3}\overrightarrow{l}^{i}(x) u_{i}(t)+ \overrightarrow{q}(x)v_{2}(t)+\overrightarrow{c}(x)v_{3}(t) +l^{\prime} (x)w_{1}(t) +\tilde{q}^{\prime}(x)w_{2}(t) $$

(34)

is controllable in the space of finite ensembles on ${\mathbb S}$ and approximately controllable in the group $\text {Diff}_{0}\left ({\mathbb S}\right )$ of the diffeomorphisms of ${\mathbb S}$.

Proof

Finite ensemble controllability follows immediately from the previous theorem. Key technical result for proving controllability in $\text {Diff}_{0}\left ({\mathbb S}\right )$ is

Proposition 6.11

The Lie algebra ${\mathscr{L}}$, generated by the vector fields

$$ \overrightarrow{l}^{1}(x), \overrightarrow{l}^{2}(x), \overrightarrow{l}^{3}(x), \overrightarrow{q}(x), \overrightarrow{c}(x), \tilde l^{\prime}(x), \tilde{q}^{\prime}(x),$$

contains all the Hamiltonian vector fields $\overrightarrow {h}$ and all the gradient vector fields $\nabla _{\mathbb S} h$, corresponding to all the spherical harmonics h on ${\mathbb S}$.

Lie algebra strong approximating property would follow from this fact by virtue of approximation results for spherical harmonics and Laplace series, which we used above in the proof of Lemma 6.9.

Let ${\mathscr{L}}_{div}$ be the image of the linear space ${\mathscr{L}}$ under the action of the linear operator $\text {div}_{\mathbb S}$.

Proposition 6.12

The linear space ${\mathscr{L}}_{div}$ contains all the spherical harmonics on S.

Assuming the result to hold, we accomplish the proof of Proposition 6.11. Let h be any spherical harmonic, which without loss of generality we may assume to be homogeneous. If $h=\text {div}_{\mathbb S} f$ and $f \in {\mathscr{L}}$ then $\text {div}_{\mathbb S} \nabla _{\mathbb S} h=\alpha h$ and hence the vector field $\overrightarrow {p}=\nabla _{\mathbb S} h - \alpha f$ is divergence-free and therefore symplectic polynomial vector field on ${\mathbb S}$. Without loss of generality one may assume that p is a restriction onto ${\mathbb S}$ of a harmonic polynomial $\hat p$. ^{Footnote 2} All polynomial symplectic vector field, which correspond to spherical harmonics, belong to ${\mathscr{L}}$ by Proposition 6.8 and hence $\nabla _{\mathbb S} h \in {\mathscr{L}}$.

Employing Maxwell’s theorem we can reduce Proposition 6.12 to a weaker statement.

Lemma 6.13

The linear space ${\mathscr{L}}_{div}$ contains all the spherical harmonics if and only if, for each k, ${\mathscr{L}}_{div}$ contains a homogeneous spherical harmonic of degree k.

Proof

For each $ l \in \mathbb {R}^{3}$ the symplectic vector field $\overrightarrow {l}$, defines the rotation $e^{\overrightarrow {l}} \in SO(3)$ of $\mathbb {R}^{3}$ and of the sphere $\mathbb {S}$. By direct computation the adjoint action $\text {Ad} \left (e^{\overrightarrow {l}}\right )$ of the rotation onto a gradient vector field ∇_Sf(x) transforms it into the gradient vector field $ \nabla _{S}f(e^{\overrightarrow {l}}(x))$. By Maxwell’s theorem [5] the group of rotations $e^{\overrightarrow {l}}$ act transitively on the space of spherical harmonics of a given degree.

By the assumptions of the theorem the Lie algebra ${\mathscr{L}}$ contains linearly independent vector fields $\overrightarrow {l}^{1}(x)$, $\overrightarrow {l}^{2}(x)$, $\overrightarrow {l}^{3}(x)$. If a spherical harmonic h is homogeneous of degree k and belongs to ${\mathscr{L}}_{div}$, then by the aforesaid $\nabla _{\mathbb S} h \in {\mathscr{L}}$, and acting onto $\nabla _{\mathbb S} h$ by $\text {Ad}\left (e^{\overrightarrow {l}}\right ), \ l \in \mathbb {R}^{3}$ we conclude by transitivity that the gradients of all spherical harmonics of degree k are in ${\mathscr{L}}$ and then the harmonics themselves are in ${\mathscr{L}}_{div}$. □

To prove the existence of spherical harmonics of each degree in ${\mathscr{L}}_{div}$ we start with two technical lemmas, whose proofs can be found in the Appendix.

Lemma 6.14

For a harmonic polynomial F, which is homogeneous of degree k in $\mathbb {R}^{3}$, there holds

$$ \begin{array}{@{}rcl@{}} \langle \nabla F(x) , x \rangle =k F(x), \ \ D^{2}F(x) x=(k-1) \nabla F(x), \\ {[ \nabla F(x), Ex]}= (2-k) \nabla F(x) . \end{array} $$

Lemma 6.15

For f, g, which are the restrictions onto $\mathbb {S}$ of the harmonic polynomials F, G, homogeneous of degrees k and l in $\mathbb {R}^{3}$, there holds

$$ \begin{array}{@{}rcl@{}} \text{div}_{\mathbb S} [\nabla_{\mathbb S} f, \nabla_{\mathbb S} g]=\text{div} [\nabla_{\mathbb S} f, \nabla_{\mathbb S} g]= \\ (k-l)(k+l+3)\left( \langle \nabla F , \nabla G \rangle|_{\mathbb S} - kl fg\right) . \end{array} $$

(35)

Corollary 6.16

Let $g(x)=x_{3}|_{\mathbb S}$ and $f(x)=F(x_{1},x_{2})|_{\mathbb S}$ be the restriction onto $\mathbb {S}$ of the harmonic polynomial F(x₁, x₂) homogeneous of degree k in the variables x₁, x₂. Then

$$ \text{div}_{\mathbb S}[\nabla_{\mathbb S} f, \nabla_{\mathbb S} g] =-(k-1)(k+4)kx_{3}f(x_{1},x_{2}) $$

(36)

and the right-hand side is a spherical harmonic polynomial homogeneous of degree k + 1.

We prove the corollary. Formula (36) follows from (35). As far as x₃ and F(x₁, x₂) are both harmonic, then

$${\Delta} (x_{3}F(x_{1},x_{2}))(x)=2\langle \nabla x_{3}, \nabla F(x_{1},x_{2}) \rangle =0$$

and hence x₃F(x₁, x₂) is harmonic in $\mathbb {R}^{3}$ and the restriction $x_{3}F(x_{1},x_{2})|_{\mathbb {S}}$ is a spherical harmonic of degree k + 1.

Now we complete the proof of Lemma 6.13. As far as the linear harmonic vector field $\tilde {l}^{\prime }=\nabla _{\mathbb S} \tilde l$, and the quadratic harmonic vector field $\tilde {q}^{\prime }=\nabla _{\mathbb S} \tilde q$ belong to ${\mathscr{L}}$ and the group of rotations $e^{\overrightarrow {l}}$ act transitively on the space of spherical harmonics of given degree, we can obtain by the action the gradients of all the spherical harmonics of degrees 1 and 2 and, in particular, $\nabla _{\mathbb S} x_{3}$ and $\nabla _{\mathbb S} f(x_{1},x_{2})$.

Then $ [\nabla _{\mathbb S} x_{3}, \nabla _{\mathbb S} f(x_{1},x_{2})] \in {\mathscr{L}}$ and by Corollary 6.16

$$ \text{div}_{\mathbb S}[\nabla_{\mathbb S} x_{3}, \nabla_{\mathbb S} f(x_{1},x_{2})] =-12x_{3} f(x_{1},x_{2})$$

with the right-hand side being a spherical harmonic of degree 3, which belongs to ${\mathscr{L}}_{div}$. Then by Maxwell theorem we conclude that the gradients of all the spherical harmonics of degree 3 belong to ${\mathscr{L}}$. The proof can be completed by induction in the degree of harmonics with Corollary 6.16 applied at each induction step. □

Notes

It is worth mentioning that the theoretical study of the discreet-time optimal control problems manifests additional complexities in comparison with the continuous-time case, unless additional regularity assumptions, such as convexity are imposed.
Any restriction of a polynomial in $\mathbb {R}^{3}$ onto ${\mathbb S}$ can be represented as a restriction onto ${\mathbb S}$ of a harmonic (nonhomogeneous) polynomial
The convergence is determined by the interplay of two entities: the Christoffel constant Λ_n (or the related Lebesgue constant) and the approximation error rate E_n of the function Y_j(x)e^γ(x) by means of the n-truncations of the Hermite series. For the uniform convergence it suffices [7, Proposition 7.1.2] that ${\varLambda }_{n} \sim n^{-d}$ as $n \to \infty $ and $|E_{n}| \leq n^{-\frac {d}{2}-\beta }, \ \beta >0$. For the first fact see [7, Proposition 7.1.5]; for the second fact, valid for $C^{\left [\frac {d}{2}\right ]+2}$-smooth functions, see [7, Corollary 7.1.3].

References

Agrachev A, Gamkrelidze R. The exponential representation of flows and the chronological calculus. Math USSR Sbornik 1979;35:727–785.
Article Google Scholar
Agrachev A, Sachkov Y. Control theory from the geometric viewpoint. Berlin-Heidelberg-New York: Springer; 2004.
Book Google Scholar
Agrachev A, Sarychev A. Solid controllability in fluid dynamics. Instability in models connected with fluid flows, I. New York: Springer; 2008, pp. 1–35.
Book Google Scholar
Agrachev A, Sarychev A. Control in the spaces of ensembles of points. SIAM J Control Optim 2020;58:1579–1596.
Article MathSciNet Google Scholar
Arnold VI. Lectures on partial differential equations. Berlin-Heidelberg: Springer; 2004.
Book Google Scholar
Cuchiero C, Larsson M, Teichmann J. Deep neural networks, generic universal interpolation, and controlled ODEs. SIAM J Math Data Sci 2020; 2:901–919.
Article MathSciNet Google Scholar
Dunkl CF, Xu Y. Orthogonal polynomials of several variables, 2nd ed. Cambridge: Cambridge University Press; 2014.
Book Google Scholar
Esteve C, Geshkovski B, Pighin D, Zuazua E. 2020. Large-time asymptotics in deep learning. HAL archives ouvertes. https://hal.archives-ouvertes.fr/hal-02912516.
Ganesh M, Graham IG, Sivaloganathan J. A pseudospectral three-dimensional boundary integral method applied to a nonlinear model problem from finite elasticity. SIAM J Numer Anal 1994;31:1378–1414.
Article MathSciNet Google Scholar
Higham CF, Higham DJ. Deep learning: an introduction for applied mathematicians. SIAM Rev 2019;61:860–891.
Article MathSciNet Google Scholar
Li Q, Chen L, Tai C, W E. Maximum principle based algorithms for deep learning. J Machine Learning Res 2018;18:1–29.
MathSciNet MATH Google Scholar
Tabuada P, Gharesifard B. 2020. Universal approximation power of deep neural networks via nonlinear control theory ArXiv preprint arXiv:2007.06007.pdf.

Download references

Acknowledgements

The authors are grateful to the anonymous referee for useful comments.

Funding

Open access funding provided by Università degli Studi di Firenze within the CRUI-CARE Agreement. The work of A.A. Agrachev is supported by the Russian Science Foundation under grant 17-11-01387-P.

Author information

Authors and Affiliations

SISSA, via Bonomea 265, Trieste, 34136, Italy
Andrei Agrachev
DiMaI, University of Florence, viale G.B.Morgagni, 67a, Firenze, 50134, Italy
Andrey Sarychev

Authors

Andrei Agrachev
View author publications
You can also search for this author in PubMed Google Scholar
Andrey Sarychev
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Andrey Sarychev.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix. Proofs of technical lemmas

1.1 A.1 Proof of Lemma 6.2

As far as the function Y_j(x)e^γ(x) is $C^{\left [\frac {d}{2}\right ]+2}$-smooth, the partial sums of series (25) converge uniformly to it according to [7, Propositions 7.1.2, 7.1.5, Corollary 7.1.3]. ^{Footnote 3}

Thus for each ε > 0 one can find sufficiently large n for which the partial sums $S_{n}(x)={\sum }_{m: \ |m| \leq n} c_{m_{1}, {\ldots } , m_{d}} H_{m_{1}, {\ldots } , m_{d}}(x)$ satisfy

$$\|S_{n}(x) - Y_{j}(x)e^{\gamma (x)}\|_{0,K} < \varepsilon , $$

and hence

$$\|S_{n}(x)e^{-\gamma (x)} - Y_{j}(x)\|_{0,K} < \varepsilon . $$

To get a bound for the (first) partial derivative, say in x₁, of the functions S_n(x)e^−γ(x) we note that

$$ \frac{\partial}{\partial x_{1}} \left( H_{m_{1}, {\ldots} , m_{d}}(x)e^{-\gamma (x)}\right)=- H_{m_{1}+1, {\ldots} , m_{d}}(x)e^{-\gamma (x)} , $$

and therefore

$$ \frac{\partial}{\partial x_{1}}\left( S_{n}(x)e^{-\gamma (x)}\right)= -\sum\limits_{m: \ |m| \leq n} c_{m_{1}, {\ldots} , m_{d}} H_{m_{1}+1, {\ldots} , m_{d}}(x) e^{-\gamma (x)}. $$

We prove that the latter series ${\sum }_{m} c_{m_{1}, {\ldots } , m_{d}} H_{m_{1}+1, {\ldots } , m_{d}}(x)$ is the Fourier-Hermite series for the function $\frac {\partial Y_{j}(x)}{\partial x_{1}}e^{\gamma (x)} \in C^{\left [\frac {d}{2}\right ]+1}$ and hence converges uniformly to $\frac {\partial Y_{j}(x)}{\partial x_{1}}$ as $n \to \infty $.

Multivariate Hermite polynomials are factorable into the products of univariate Hermite polynomials:

$$ H_{m_{j1}, {\ldots} , m_{jd}}(x_{1}, {\ldots} , x_{d})=H_{m_{j1}}(x_{1}) {\cdots} H_{m_{jd}}(x_{d})$$

and therefore we may proceed as in the univariate case. It suffices to prove that given $ Y_{j}(x)e^{\gamma (x)} \sim {\sum }_{m} c_{m} H_{m}(x), \ x \in \mathbb {R} $ it follows

$$ Y^{\prime}_{j}(x)e^{\gamma(x)} \sim -\sum\limits_{m} c_{m} H_{m+1}(x), \ x \in \mathbb{R} . $$

(37)

From the formulae for the Fourier-Hermite coefficients it follows that

$$ c_{m}= \frac{{\int}_{\mathbb{R}}Y_{j}(x)H_{m}(x)dx}{{\int}_{\mathbb{R}}(H_{m}(x))^{2} e^{-\gamma(x)}dx}.$$

Since $H^{\prime }_{m+1}(x)=(m+1)H_{m}(x)$ we get

$$c_{m}= \frac{{\int}_{\mathbb{R}}Y_{j}(x)H^{\prime}_{m+1}(x)dx}{(m+1){\int}_{\mathbb{R}}(H_{m}(x))^{2} e^{-\gamma(x)}dx} . $$

From the identities ${\int \limits }_{\mathbb {R}}(H_{m}(x))^{2} e^{-\gamma (x)}dx=\sqrt {2\pi }m!, \ m=0,1,2 {\ldots } $ we conclude that the denominator coincides with ${\int \limits }_{\mathbb {R}}(H_{m+1}(x))^{2} e^{-\gamma (x)}dx$. Integrating the numerator by parts we bring it to the form $-{\int \limits }_{\mathbb {R}}Y^{\prime }_{j}(x)H_{m+1}(x)dx$ and thus conclude (37).

By the above cited approximation results from [7] the partial derivatives $\frac {\partial }{\partial x_{i}}$ $\left (S_{n}(x)e^{-\gamma (x)}\right )$ converge uniformly to $\frac {\partial Y_{j}(x)}{\partial x_{i}}$ as $n \to \infty $ and hence are upper equibounded for all n.

1.2 A.2 Proof of Lemma 6.14

First equality is the well-known Euler identity for homogeneous functions.

Differentiating the identity

$$ \forall t \in \mathbb{R}, \ x,y \in \mathbb{R}^{3}: \ \nabla F(x+t y) \cdot (x+ty)=k F(x+ty) $$

in t at t = 0 we conclude

$$D^{2}F(x)y \cdot x +\nabla F (x) \cdot y =k \nabla F (x) \cdot y \ $$

and hence ∀y : D²F(x)x ⋅ y = (k − 1)∇F(x) ⋅ y wherefrom the second equality follows.

The third equality follows from the previous two directly.

1.3 A.3 Proof of Lemma 6.15

By direct computation with the use of Euler identity:

$$ \begin{array}{@{}rcl@{}} [\nabla_{\mathbb S} f , \nabla_{\mathbb S} g]= [\text{pr} \nabla F, \text{pr} \nabla G]= [\nabla F - \langle \nabla F, x\rangle E(x), \nabla G - \langle \nabla G, x\rangle E(x) ]= \\ {[\nabla F(x) - (kF(x))E(x), \nabla G(x) - (lG(x))E(x) ]}. \end{array} $$

By simple manipulation with application of the identities of Lemma 6.14 we get

$$ \begin{array}{@{}rcl@{}} [\nabla_{\mathbb S} f , \nabla_{\mathbb S} g]=[\nabla F, \nabla G] -(2-k)(lG(x))\nabla F(x)+(2-l)(kF(x))\nabla G(x)+ \\ (k-l)\langle \nabla F(x) , \nabla G(x) \rangle E(x)+ kl(l-k)(F(x)G(x))E(x) . \end{array} $$

Recall that for F, G, which are harmonic in $\mathbb {R}^{3}$, their gradients ∇F,∇G are divergence-free, and so is [∇F,∇G].

Calculating the divergence of the right-hand side and using the identities of Lemma 6.14 we get the result we seek.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Agrachev, A., Sarychev, A. Control on the Manifolds of Mappings with a View to the Deep Learning. J Dyn Control Syst 28, 989–1008 (2022). https://doi.org/10.1007/s10883-021-09561-2

Download citation

Received: 13 March 2021
Revised: 13 March 2021
Accepted: 26 June 2021
Published: 14 August 2021
Issue Date: October 2022
DOI: https://doi.org/10.1007/s10883-021-09561-2

Keywords

Mathematics Subject Classification (2010)

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Control on the Manifolds of Mappings with a View to the Deep Learning

Abstract

Similar content being viewed by others

A survey of the recent architectures of deep convolutional neural networks

Identity Mappings in Deep Residual Networks

Deep learning: systematic review, models, challenges, and research directions

1 Introduction and Problem Setting

2 Neural Networks Modelled by Control Systems

3 Ensemble Optimal Control Model for the Training of Control-Theoretic ANN

3.1 Ensemble Optimal Control Model

3.2 Equations of Pontryagin Maximum Principle for Ensemble Optimal Control Problem (10)–(11)–(13)

4 Finite Ensemble Controllability via Lie Algebraic Methods

Definition 4.1 (finite ensemble controllability)

Remark 4.1

Proposition 4.2

Assumption 1 (boundedness)

Assumption 2 (Lie algebra approximating property)

Theorem 4.3 (Lie algebra approximating property and finite ensemble controllability)

Proof

Remark 4.2

5 Lie Algebra Strong Approximating Property: Controllability in the Diffeomorphism Groups and the Manifolds of Mappings

Assumption 3 (Lie algebra strong approximating property)

Theorem 5.1 (C 0-approximate controllability in the group of diffeomorphisms)

Proof

Corollary 5.2

6 Ensemble Controllable Systems on Euclidean Spaces \(\mathbb {R}^{d}\), tori \(\mathbb {T}^{d}\) and the 2-dimensional Sphere \(\mathbb {S}\)

6.1 Ensemble Controllable System in \({\mathbb R}^{d}\)

Proposition 6.1

Proof

Lemma 6.2

Theorem 6.3 (ensemble controllability of GH system)

6.2 Ensemble Controllability on the Tori \(\mathbb {T}^{d}\)

Lemma 6.4

Proof

Lemma 6.5

Proof

Theorem 6.6

6.3 Ensemble Controllability on the 2-dimensional Sphere

Theorem 6.7

Proof

Proposition 6.8

Lemma 6.9

Theorem 6.10

Proof

Proposition 6.11

Proposition 6.12

Lemma 6.13

Proof

Lemma 6.14

Lemma 6.15

Corollary 6.16

Notes

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s Note

Appendix. Proofs of technical lemmas

Appendix. Proofs of technical lemmas

1.1 A.1 Proof of Lemma 6.2

1.2 A.2 Proof of Lemma 6.14

1.3 A.3 Proof of Lemma 6.15

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification (2010)

Search

Navigation

Theorem 5.1 (C ⁰-approximate controllability in the group of diffeomorphisms)