## 1 Introduction

The presence of constraints renders optimization problems not only more interesting, but also more difficult to analyze and solve. Constrained nonlinear optimization problems on $$\mathbb {R}^m$$ can be cast in the following form,

\begin{aligned} \begin{aligned} \text {Minimize} \quad&f(x) , \quad \text {where } x \in \mathbb {R}^m \\ \text {subject to (s. t.)} \quad&g(x) \in K . \end{aligned} \end{aligned}
(1.1)

Here $$f :\mathbb {R}^m \rightarrow \mathbb {R}$$ denotes the objective function, and $$g :\mathbb {R}^m \rightarrow \mathbb {R}^n$$ represents the constraint function. Moreover, $$K \subset \mathbb {R}^n$$ is a convex cone satisfying $$0 \in K$$, i.e., it induces a preorder on $$\mathbb {R}^n$$ defined by

\begin{aligned} y \le _K z \quad \Leftrightarrow \quad y - z \in K. \end{aligned}

The constraint in (1.1) can thus be written as $$g(x) \le _K 0$$.

Problems of the form (1.1) include classical nonlinear programming problems with equality and inequality constraints. These are described by and $$K = \mathbb {R}_-^k \times \{0\}^{n-k} \subset \mathbb {R}^k \times \mathbb {R}^{n-k}$$, where $$\mathbb {R}_-^k$$ is the non-positive orthant in $$\mathbb {R}^k$$.

It is well known that—under appropriate constraint qualifications—local minimizers of (1.1) admit Lagrange multipliers, i.e., there exists $$\mu \in \mathbb {R}^n$$ such that

(1.2)

holds. In short, we can write with $$\mu \in K^\circ$$ and . The set $$K^\circ$$ is called the polar cone of K.

Equation 1.2 is known as generalized Karush–Kuhn–Tucker (KKT) conditions pertaining to problem (1.1). We refer the reader to, e. g., [9, Ch. 9], [15, 18, 5, Ch. 5], [16, Ch. 6], for results in this direction in finite and infinite-dimensional spaces.

In this paper, we generalize (1.1) to constrained optimization problems on manifolds, replacing $$\mathbb {R}^m$$ and $$\mathbb {R}^n$$ by finite-dimensional, smooth manifolds $$\mathcal {M}$$ and $$\mathcal {N}$$, respectively. Theory for the case of equality and inequality constraints $$g :\mathcal {M}\rightarrow \mathbb {R}^n$$ has been considered in [2, 17] and some algorithmic approaches have been discussed in [8, 12]. Theory and an algorithm for equality constraints of the form $$g(p)=q_*$$ with $$g :\mathcal {M}\rightarrow \mathcal {N}$$ were presented in [14]. Here we aim to incorporate equality and inequality constraints for manifold-valued constraint mappings $$g :\mathcal {M}\rightarrow \mathcal {N}$$.

Such an extension is not straightforward since there is no natural way to define a cone (nor a preorder) on the manifold $$\mathcal {N}$$ which would take the role of the condition $$g(x) \in K$$. We propose here to overcome this difficulty by requiring the constraint function to have values in a submanifold with corners $$\mathcal {K}\subset \mathcal {N}$$, a mathematical object that corresponds to a convex cone locally in adequate charts.

We thus consider the following class of problems,

\begin{aligned} \begin{aligned} \text {Minimize} \quad&f(p) , \quad \text {where } p \in \mathcal {M}\\ \text {s. t.} \quad&g(p) \in \mathcal {K}, \end{aligned} \end{aligned}
(1.3)

which generalizes (1.1). The description of the feasible set as turns out to be convenient and relevant in a number of situations. Moreover, it will be shown that this description is independent of possibly varying parametrizations of the given problem.

Our formulation differs from other generalizations of equality and inequality constraints. Consider for instance a geodesic polygon as a feasible set $$\mathcal {F}$$, defined on the sphere $$\mathcal {M}= \mathcal {S}^{2}$$, i.e.a set bounded by a set of geodesics. More generally, we can also consider a geodesic polyhedron on $$\mathcal {S}^{m}$$, i.e., a region bounded by a number of geodesic hyperplanes. In other words, its boundary consists of totally geodesic submanifolds, cf., e. g., [6, Ch. XI, §4]. An example of a geodesic polygon is given in see Figure 1 in $$\mathcal {S}^{2}$$. $$\mathcal {F}$$ constitutes a submanifold of $$\mathcal {N}= \mathcal {M}$$ with corners, so it can be naturally parametrized as $$g(p) \in \mathcal {K}$$ with $$g = id _\mathcal {M}$$ and $$\mathcal {K}= \mathcal {F}$$. By contrast, an algebraic description of $$\mathcal {F}$$ in terms of classical inequalities runs into difficulties. In the case of a vector space $$\mathcal {M}= \mathbb {R}^m$$, the analogue of $$\mathcal {F}$$ (an ordinary polygon) can be easily represented as the intersection of finitely many closed half spaces, using linear inequality constraints . A similar attempt to describe $$\mathcal {F}$$ on $$\mathcal {S}^{m}$$ via inequality constraints of the type $$g_i(p) = (\hbox {log}_{q_i}\, p, n_i) \le 0$$ can certainly be used locally; however, the lack of injectivity of the exponential map on $$\mathcal {S}^{m}$$, and thus the lack of global well-definedness of its inverse, the logarithmic map, makes this inequality constraint globally not well-defined.

This paper is structured as follows. We describe our approach to modeling manifold-valued constraints using manifolds with corners in Sect. 2. Constraint qualifications are introduced and discussed in Sect. 3. Section 4 is devoted to the derivation of first-order necessary optimality conditions. We show in Sect. 5 that equivalent conditions are obtained when the problem is pulled back to a tangent space, using a retraction. In Sect. 6 we introduce the analogue of a Lagrangian function for (1.3). In preparation for the formulation of second-order optimality conditions in Sect. 8, we define the critical cone in Sect. 7. Finally, Sect. 9 presents an application of our theory to the control of discretized variational problems.

We denote manifolds as well as subsets of manifolds by calligraphic letters. For an introduction to differentiable manifolds, we refer the reader, e. g., to [7]. Points on the manifold $$\mathcal {M}$$ are denoted by the letter p, while points on $$\mathcal {N}$$ are denoted by q. Each manifold comes with a collection of charts $$(\mathcal {U},\psi )$$, and each chart maps an open subset $$\mathcal {U}$$ of $$\mathcal {M}$$ (or $$\mathcal {N}$$) onto an open set in $$\mathbb {R}^m$$ (or $$\mathbb {R}^n$$), where m and n are the dimensions of $$\mathcal {M}$$ and $$\mathcal {N}$$, respectively. We say that a chart $$(\mathcal {U},\psi )$$ is centered at a point p if $$p \in \mathcal {U}$$ holds. For the purpose of this paper, since we will be pursuing a first- and second-order analysis, we will mostly assume that $$\mathcal {M}$$ and $$\mathcal {N}$$ are of class $$C^2$$, i.e., the chart transition maps $$\psi _2 \circ \psi _1^{-1}$$ are of this class. In chart space, we use the letters $$x \in \mathbb {R}^m$$ and $$y \in \mathbb {R}^n$$. We write $$C^j(\mathcal {M},\mathcal {N})$$ for the set of all mappings $$\mathcal {M}\rightarrow \mathcal {N}$$ which are j times continuously differentiable. The identity mappings on a vector space V or on a manifold $$\mathcal {M}$$ are denoted by $$id _V$$ and $$id _\mathcal {M}$$, respectively. The zero element in the tangent space  of a manifold $$\mathcal {M}$$ at p is denoted by $$0_p$$. We distinguish primal elements and dual elements and write dual pairings in the form $$\mu \, v$$ and compositions with linear mappings A into .

## 2 Manifold-Valued Constraints

Our method of choice to generalize equality and inequality constrained problems to manifolds is to replace the usual cone K that the equality and inequality constraints g are mapping into by a submanifold with corners.

In the following we use $$0 \le k \le n$$ and write $$\mathbb {R}^k \times \{0\}^{n-k}$$ to denote the subset of $$\mathbb {R}^n$$ consisting of those elements whose last $$n-k$$ components vanish. We define the map $$W :\mathbb {R}^n \rightarrow \mathbb {R}^{n-k}$$ by . Further, as usual, $$v \le 0$$ in $$\mathbb {R}^\ell$$ means $$v_i \le 0$$ for $$i = 1, \dots , \ell$$.

### Definition 2.1

(Submanifold with corners [10]). Suppose that $$\mathcal {N}$$ is an n-dimensional $$C^2$$-manifold. A subset $$\mathcal {K}\subset \mathcal {N}$$ is called a submanifold with corners of dimension k if, for each $$q \in \mathcal {N}$$, there exists a local chart $$(\mathcal {U},\psi )$$ satisfying $$\psi (q) = 0$$, an index $$\ell$$ satisfying $$0 \le \ell \le k$$, and a surjective linear operator

\begin{aligned} A :\mathbb {R}^k \times \{0\}^{n-k} \rightarrow \mathbb {R}^\ell \end{aligned}

such that

holds. In this case, $$(\mathcal {U},\psi )$$ is termed an adapted local chart centered at q.

We may identify A with a matrix where $${\widehat{A}} \in \mathbb {R}^{\ell \times k}$$ and $$0 \in \mathbb {R}^{\ell \times (n-k)}$$. For $$x \in \mathbb {R}^k \times \{0\}^{n-k}$$, we then have .

We refer to q in Definition 2.1 as a corner of index $$\ell$$. It has been shown in [10] that the index $$\ell$$, which may of course depend on q, however does not depend on the particular choice of the adapted local chart centered at q. In terms of optimization, $$\ell$$ describes the number of active inequality constraints at q. This generalizes the notion of vertices ($$\ell = k$$), edges ($$\ell = k-1$$), and higher-dimensional facets.

The requirement $$\ell \le k$$ is essential in this definition. In local charts, the description of a corner satisfies the linear independence constraint qualification (LICQ), because the rows of $${\widehat{A}}$$ are necessarily linearly independent to guarantee surjectivity. Thus, whenever $$({{\widetilde{\mathcal {U}}}}, {{\widetilde{\psi }}})$$ is a (non-adapted) local chart on $$\mathcal {N}$$ such that $${{\widetilde{\psi }}}(\mathcal {K}\cap {{\widetilde{\mathcal {U}}}})$$ is given by the nonlinear constraint $${\widetilde{A}}(x) \le 0$$ with $${\widetilde{A}}(0) = 0$$, we can use the surjective implicit function theorem to construct an adapted local chart $$\psi$$ such that $$\psi (\mathcal {K}\cap \mathcal {U})$$ is described by $${\widetilde{A}}'(0) \, x \le 0$$.

Definition 2.1 can be conceived as straightforward generalizations of the concepts

(i):

of an embedded submanifold $$\mathcal {K}\subset \mathcal {N}$$, which is obtained when $$\ell = 0$$ holds for all $$q \in \mathcal {K}$$,

(ii):

of a smoothly bounded subset $$\mathcal {K}\subset \mathcal {N}$$ with non-empty interior, which is obtained when $$k = n$$ and, for every $$q \in \mathcal {N}$$, either $$\ell = 0$$ (interior point) or $$\ell = 1$$ (boundary point) holds,

(iii):

and of a convex polyhedron $$\mathcal {K}\subset \mathcal {N}= \mathbb {R}^n$$, whose corners satisfy the above regularity condition. In particular, the non-positive orthant $$\mathcal {K}= \mathbb {R}_-^n\subset \mathbb {R}^n$$ is a submanifold with corners of dimension n of $$\mathbb {R}^n$$. For instance, the origin $$q = 0$$ is a corner of index n and it can be described by $${\widehat{A}} = id _{\mathbb {R}^n}$$. As another example, the point $$q = -e_j$$ (the negative j-th unit vector in $$\mathbb {R}^n$$), is a corner of index $$n-1$$ and a local description of $$\mathcal {K}$$ can be defined via $${\widehat{A}} \in \mathbb {R}^{(n-1) \times n}$$ whose rows are with $$1 \le i \le n$$, $$i \ne j$$.

Next we discuss tangent spaces in the context of submanifolds with corners. Among the various equivalent ways to define the tangent space for differentiable manifolds, we use the one given in [6, 10]. Let $$q \in \mathcal {N}$$ and consider the set

For two charts $$\psi _1, \psi _2$$, we denote the transition map by $$T {:}{=}\psi _2 \circ \psi _1^{-1}$$. Define an equivalence relation $$(\psi _1,v_{\psi _1}) \sim (\psi _2,v_{\psi _2})$$ by

\begin{aligned} T'(\psi _1(q)) \, v_{\psi _1} = v_{\psi _2} . \end{aligned}

We call any corresponding equivalence class a tangent vector v of $$\mathcal {N}$$ at q and $$v_\psi$$ its representative in the chart $$\psi$$. For fixed $$q \in \mathcal {N}$$, the set of these equivalence classes is a vector space , termed the tangent space of $$\mathcal {N}$$ at q. The disjoint union of over all $$q \in \mathcal {N}$$ can be endowed with the structure of a manifold, more accurately a vector bundle, termed the tangent bundle of $$\mathcal {N}$$.

Suppose now that $$\mathcal {K}$$ is a submanifold with corners of $$\mathcal {N}$$ of dimension k. For $$q \in \mathcal {K}$$, we define the tangent space as the set of all which possess a representative $$v_\psi$$ in an adapted chart $$\psi$$ centered at q such that $$v_\psi$$ is an element of $$\mathbb {R}^k \times \{0\}^{n-k}$$. In this case, all representatives of v in all adapted charts centered at q satisfy the same relation. It is easy to verify that is a linear subspace of of dimension k. Notice that the dimension of does not depend on the index of q as a corner of $$\mathcal {K}$$.

Further, the set of inner tangent vectors is defined as all which satisfy, in addition, $$A \, v_\psi \le 0$$ for representatives in adapted charts centered at q. As discussed in [10], $$\mathcal {T}_{q}^i{\mathcal {K}}$$ is well-defined and it is a polyhedral convex cone. Similarly, we denote by $$\mathcal {T}_{q}^0{\mathcal {K}}$$ the linear subspace of all elements v of for which the representatives in adapted charts centered at q satisfy $$A \, v_\psi = 0$$. We refer the reader to Figure 2 for an illustrative example.

The following are our standing assumptions for the remainder of this paper.

### Assumption 2.2

Let $$\mathcal {M}$$ and $$\mathcal {N}$$ be $$C^2$$-manifolds of dimensions m and n, respectively. Moreover, let $$\mathcal {K}$$ be a submanifold with corners of $$\mathcal {N}$$ of dimension k. We further suppose that $$f \in C^2(\mathcal {M},\mathbb {R})$$ and $$g \in C^2(\mathcal {M},\mathcal {N})$$ hold and consider the following problem:

\begin{aligned} \begin{aligned} \text {Minimize} \quad&f(p) , \quad \text {where } p \in \mathcal {M}\\ \text {s. t.} \quad&g(p) \in \mathcal {K}. \end{aligned} \end{aligned}
(2.1)

Notice that products of submanifolds with corners are again submanifolds with corners. One can therefore easily combine several constraints, e. g., $$g_1(p) \in \mathcal {K}_1$$ and $$g_2(p) \in \mathcal {K}_2$$, into one single constraint mapping into a product manifold. We re-iterate that (2.1) generalizes classical nonlinear programming problems with equality and inequality constraints. The latter are obtained in case $$\mathcal {M}= \mathbb {R}^m$$, $$\mathcal {N}= \mathbb {R}^n$$, $$\mathcal {K}= \mathbb {R}_-^k \times \{0\}^{n-k} \subset \mathbb {R}^k \times \mathbb {R}^{n-k}$$. At any $$p \in \mathcal {K}$$, the adapted local chart centered at a point p can be chosen as $$\varphi ({\tilde{p}}) = {\tilde{p}}-p$$, and $${\widehat{A}}$$ consists of the appropriate rows of $$id _{\mathbb {R}^k}$$.

Be aware that in general the feasible set $$\mathcal {F}{:}{=}g^{-1}(\mathcal {K}) \subset \mathcal {M}$$ is not a submanifold with corners even though $$\mathcal {K}$$ is. For example, consider p to be the tip of a pyramid $$\mathcal {P}$$ in $$\mathbb {R}^3$$, where $$\ell >3$$ planes meet. Then, locally near p, $$\mathcal {P}$$ is described by $$\ell >3$$ inequality constraints, and thus $$\mathcal {P}$$ cannot be a submanifold with corners of $$\mathbb {R}^3$$, because this would violate the condition $$\ell \le k = 3$$ in Definition 2.1. Nevertheless, with a suitable affine mapping $$g :\mathbb {R}^3 \rightarrow \mathbb {R}^\ell$$, $$\mathcal {P}$$ can be described locally as $$\mathcal {P}= g^{-1}(\mathbb {R}_-^\ell )$$. Thus, by means of the constraint mapping g we can obtain feasible sets more general than submanifolds with corners of $$\mathcal {M}$$. Also in view of practical computational approaches, the set $$\mathcal {K}$$ should have a simple structure, allowing, e. g., a local representation in computable adapted charts.

Suppose that $$\varphi :\mathcal {M}\supset \mathcal {U}_p \rightarrow \mathbb {R}^m$$ is a chart centered at p and that $$\psi :\mathcal {N}\supset \mathcal {U}_{g(p)} \rightarrow \mathbb {R}^n$$ is a chart centered at g(p). We may then define the following local representations of f and g:

\begin{aligned} f_\varphi {:}{=}f \circ \varphi ^{-1} :\varphi (\mathcal {U}_p) \rightarrow \mathbb {R}, \quad g_{\psi ,\varphi } {:}{=}\psi \circ g \circ \varphi ^{-1} :\varphi (\mathcal {U}_p) \rightarrow \mathbb {R}^n \end{aligned}

and obtain the following classical constrained optimization problem locally:

(2.2)

As a general strategy, we will carry over results on first- and second-order optimality conditions from (2.2) to (2.1) by formulations that are independent of the local representation in charts. We will use rather straightforward and well established strategies of proof but highlight invariance considerations which arise in the differential geometric context.

### Example 2.3

Consider the standard case, i.e.$$\mathcal {N}= \mathbb {R}^{n_I+n_E}$$ and

\begin{aligned} \begin{aligned} g_I(x)&\le 0&\text {in } \mathbb {R}^{n_I} , \\ g_E(x)&= 0&\text {in } \mathbb {R}^{n_E} . \end{aligned} \end{aligned}

This fits into our general setting (1.3) if we define

This set $$\mathcal {K}$$ is a submanifold of $$\mathcal {N}$$ with corners of dimension $$k = n_I$$. An adapted chart at a point $$y \in \mathcal {K}$$ can be defined by $$\varphi (\eta ) = \eta - y$$ and by choosing the chart domain $$\mathcal {U}$$ as an open -ball about y with radius . The index $$\ell$$ of any point $$y \in \mathcal {K}$$ equals the number of components $$1 \le i \le n_I$$ for which $$y_i = 0$$ holds. Then the linear mapping $${\widehat{A}} \in \mathbb {R}^{\ell \times k}$$ consists of rows equal to (the i-th unit vector in $$\mathbb {R}^k$$) for each index i with $$y_i = 0$$. For any $$y \in \mathcal {K}$$, the tangent space (in its representation w.r.t. the chart $$\varphi (\eta ) = \eta - y$$) is given by $$\mathbb {R}^{n_I} \times \{0\}^{n_E}$$. At the point, , for instance, the cone of inner tangent vectors is described by $$\mathbb {R}\times \mathbb {R}_-^{n_I-1} \times \{0\}^{n_E}$$, while the subspace $$\mathcal {T}_{y}^0{\mathcal {K}}$$ is equal to $$\mathbb {R}\times \{0\}^{n_I-1} \times \{0\}^{n_E}$$.

### Example 2.4

Consider a geodesic polyhedron $$\mathcal {K}\subset \mathcal {N}$$ on a Riemannian manifold $$\mathcal {N}$$, i.e., a set whose facets are totally geodesic submanifolds as in Figure 1; cf., e. g., [6, Ch. XI, §4]. We may use the logarithmic map to construct an adapted local chart at a point $$q \in \mathcal {K}$$. Then $$\mathcal {K}$$ can be represented as $$A \, v_\psi \le 0$$ and $$\mathcal {K}$$ is a manifold with corners, provided that A (which depends on q and $$\psi$$, of course) is surjective at any $$q \in \mathcal {K}$$.

### Example 2.5

Given two mappings $$g_\ell , g_r :\mathcal {M}\rightarrow \mathcal {N}$$, consider the equality constraint

\begin{aligned} g_\ell (p) = g_r(p) . \end{aligned}

Since $$\mathcal {N}$$ in general is not a vector space, this constraint cannot be written in the usual form $$g_\ell (p) - g_r(p) = 0$$. However, it can be formulated as $$g(p) \in \mathcal {K}$$ via the mapping

\begin{aligned} g :\mathcal {M}\ni p \mapsto (g_\ell (p), g_r(p)) \in \mathcal {N}\times \mathcal {N}\end{aligned}

with the diagonal submanifold of $$\mathcal {N}\times \mathcal {N}$$.

### Example 2.6

Consider a vector bundle $$\pi :\mathcal {N}\rightarrow \mathcal {B}$$, where $$\mathcal {B}$$ and $$\mathcal {N}$$ are smooth manifolds and $$\pi$$ is a smooth surjective map. In fact, the total space $$\mathcal {N}$$ of a vector bundle is a manifold with special structure in the sense that, for each q in the base manifold $$\mathcal {B}$$, the preimages $$\pi ^{-1}(q)$$ (called fibres) are linear spaces; see, e. g., [6, Ch. III].

In applications, a constraint mapping $$g :\mathcal {M}\rightarrow \mathcal {N}$$ of the form

\begin{aligned} g(p) = 0_{\pi (g(p))} \end{aligned}

arises frequently, in particular when or is the tangent bundle or cotangent bundle over $$\mathcal {B}$$, respectively. Since the mapping $$q \mapsto 0_{\pi (q)}$$ is well-defined and smooth on vector bundles, this constraint is of the form discussed in Example 2.5.

If the fibres $$\pi ^{-1}(q)$$ of $$\mathcal {N}$$ are equipped with preorder cones $$K_q \subset \pi ^{-1}(q)$$, then also inequality constraints of the form

\begin{aligned} g(p) \le 0_{\pi (g(p))} , \quad \text {i.e.,} \quad g(p) \in -K_{\pi (g(p))} \end{aligned}

can be included under suitable assumptions on the choice of cones.

## 3 Constraint Qualifications

We recapitulate the definition of the tangent cone of a subset $$\mathcal {F}\subset \mathcal {M}$$ and generalize basic results, known for optimization problems on vector spaces, to the case of manifolds with corners. We recall that $$t_k \searrow 0$$ denotes a sequence of strictly positive real numbers that converges to 0.

### Definition 3.1

(Tangent cone). Let $$p \in \mathcal {F}$$ and $$(\mathcal {U},\varphi )$$ be a chart centered at p. A tangent vector  is said to belong to the tangent cone at p if there exists a representative $$v_\varphi$$ in the chart $$\varphi$$ and sequences $$t_k \searrow 0$$ and $$x_{\varphi ,k} \in \mathbb {R}^m$$ such that

\begin{aligned} x_{\varphi ,k} \rightarrow v_\varphi \text { and } t_k \, x_{\varphi ,k} \in \varphi (\mathcal {F}\cap \mathcal {U}) \end{aligned}
(3.1)

holds. We then call $$x_{\varphi ,k}$$ a feasible tangential sequence for $$v_\varphi$$.

The following result shows that Definition 3.1 does not depend on the chosen chart. Indeed, the tangent cone can alternatively be defined without the use of a chart; compare [2, Def. 3.2].

### Lemma 3.2

Property (3.1) holds for one representative of if and only if it holds for every representative of v.

### Proof

Consider two local charts $$\varphi _1$$ and $$\varphi _2$$ centered at p and their smooth transition map $$T = \varphi _2 \circ \varphi _1^{-1}$$, defined in a neighborhood $$\mathcal {U}$$ of $$0 = \varphi _1(p) = \varphi _2(p) = T(0)$$. Then the corresponding representatives $$v_{\varphi _1}$$ and $$v_{\varphi _2}$$ of a tangent vector are related by $$v_{\varphi _2} = T'(0) \, v_{\varphi _1}$$. By differentiability of T we obtain (for sufficiently large k so that $$t_k \, x_{\varphi _1,k} \in \mathcal {U}$$):

\begin{aligned} x_{\varphi _2,k} {:}{=}\frac{T(t_k \, x_{\varphi _1,k})-T(0)}{t_k} \rightarrow T'(0) \, v_{\varphi _1} = v_{\varphi _2} \end{aligned}

for any pair of sequences $$x_{\varphi _1,k} \rightarrow v_{\varphi _1}$$ and $$t_k \searrow 0$$. Hence, $$v_{\varphi _1}$$ satisfies (3.1) if and only if $$v_{\varphi _2}$$ does. $$\square$$

Obviously, $$\mathcal {C}_{p}{\mathcal {F}}$$ is a cone and $$0 \in \mathcal {C}_{p}{\mathcal {F}}$$. Furthermore, it is closed. To see this, consider a sequence $$v^i \in \mathcal {C}_{p}{\mathcal {F}}$$ which converges to with $$v \ne 0$$. Using a chart, we have sequences $$t_k^i \searrow 0$$ and $$x_{\varphi ,k}^i \rightarrow v_\varphi ^i$$. From these, appropriate diagonal sequences can be chosen to verify $$v \in \mathcal {C}_{p}{\mathcal {F}}$$.

The following simple lemma can be proved as in the standard case:

### Lemma 3.3

Let $$f \in C^1(\mathcal {M},\mathbb {R})$$ and assume that p is a local minimizer of f on a set $$\mathcal {F}\subset \mathcal {M}$$. Then $$f'(p) \, v \ge 0$$ holds for all $$v \in \mathcal {C}_{p}{\mathcal {F}}$$.

### Proof

Consider $$v \in \mathcal {C}_{p}{\mathcal {F}}$$ and a corresponding tangential sequence $$t_k \, v_k \in \mathcal {F}$$ with representatives $$x_{\varphi ,k}$$. Then, by optimality, $$t_k^{-1}(f_\varphi (t_k \, x_{\varphi ,k}) - f_\varphi (0)) \ge 0$$ holds for $$k \in \mathbb {N}$$ sufficiently large. Since $$x_{\varphi ,k} \rightarrow v_\varphi$$ we obtain $$f'(0) \, x_{\varphi ,k} \rightarrow f'(0) \, v_\varphi$$, but since $$t_k^{-1}(f_\varphi (t_k \, x_{\varphi ,k}) - f_\varphi (0)) - f'(0) \, x_{\varphi ,k} \rightarrow 0$$ by differentiability, this limit has to be non-negative. $$\square$$

The following result shows that the tangent cone to a submanifold with corners has a particularly simple structure since it agrees with the cone of inner tangent vectors defined in Sect. 2:

### Proposition 3.4

Suppose that $$\mathcal {K}$$ is a submanifold with corners of $$\mathcal {N}$$ and $$q \in \mathcal {K}$$. Then

\begin{aligned} \mathcal {C}_{q}{\mathcal {K}} = \mathcal {T}_{q}^i{\mathcal {K}} . \end{aligned}

### Proof

Let . Consider an adapted local chart $$\psi$$ of $$\mathcal {K}\subset \mathcal {N}$$, centered at q, and defined on a neighborhood $$\mathcal {U}$$ of q, and $$v_\psi$$ the corresponding representative of v. Since both $$\mathcal {C}_{q}{\mathcal {K}}$$ and $$\mathcal {T}_{q}^i{\mathcal {K}}$$ are cones, we may assume w.l.o.g. that $$\lambda \psi (\mathcal {K}\cap \mathcal {U}) \subset \psi (\mathcal {K}\cap \mathcal {U})$$ and $$\lambda v_\psi \in \psi (\mathcal {U})$$ for $$\lambda \in [0,1]$$. Two cases can occur. If $$v_\psi \in \psi (\mathcal {K})$$, then $$v \in \mathcal {T}_{q}^i{\mathcal {K}}$$ holds by definition, and $$v \in \mathcal {C}_{q}{\mathcal {K}}$$ follows because $$t_k \, v_\psi \in \psi (\mathcal {K}\cap \mathcal {U})$$ is clearly a tangential sequence. By contrast, if $$v_\psi \not \in \psi (\mathcal {K})$$, then $$v \not \in \mathcal {T}_{q}^i{\mathcal {K}}$$ by definition. Moreover,

because $$\psi (\mathcal {K}\cap \mathcal {U})$$ is closed in $$\psi (\mathcal {U})$$. Then we can compute

Hence, there is no feasible tangential sequence for $$v_\psi$$. $$\square$$

In the following we consider the linearization

of g at p. Its representation in a local chart $$\varphi$$, centered at p, and an adapted local chart $$\psi$$, centered at g(p), reads:

\begin{aligned} g_{\psi ,\varphi }'(0) {:}{=}(\psi \circ g \circ \varphi ^{-1})'(0) :\mathbb {R}^m \rightarrow \mathbb {R}^n . \end{aligned}

### Definition 3.5

(Linearizing cone). The linearizing cone at a point $$p \in \mathcal {F}$$ is defined as

### Lemma 3.6

We have $$\mathcal {C}_{p}{\mathcal {F}} \subset \mathcal {L}_{p}({g},{\mathcal {K}})$$.

### Proof

Consider $$v \in \mathcal {C}_{p}{\mathcal {F}}$$, its representation in a chart $$v_{\varphi }$$ and corresponding sequences $$t_k \searrow 0$$ and $$x_{\varphi ,k} \rightarrow v_\varphi$$, where $$g_{\psi ,\varphi }(t_k \, x_{\varphi ,k}) \in \psi (\mathcal {K})$$. We obtain:

\begin{aligned} A \, g_{\psi ,\varphi }(t_k \, x_{\varphi ,k}) \le 0 , \quad A \, g_{\psi ,\varphi }(0) = 0 . \end{aligned}

It follows that

and thus

\begin{aligned} \frac{1}{t_k} A \, g_{\psi ,\varphi }(t_k \, x_{\varphi ,k}) \rightarrow A \, g_{\psi ,\varphi }'(0)(v_\varphi ) . \end{aligned}

Since every row of the left hand side is non-positive, its limit cannot be positive. Thus, $$A \, g_{\psi ,\varphi }'(0)(v_\varphi ) \le 0$$ and similarly $$W g_{\psi ,\varphi }'(0)(v_\varphi ) = 0$$. This implies $$v \in \mathcal {L}_{p}({g},{\mathcal {K}})$$. $$\square$$

### Definition 3.7

The (description of the) feasible set $$\mathcal {F}$$ is called transversal over $$\mathcal {K}$$ at $$p \in \mathcal {F}$$ if

It is said to satisfy the Zowe–Kurcyusz–Robinson constraint qualification (ZKRCQ, compare [18]) at $$p \in \mathcal {F}$$ if

It is said to satisfy the linear independence constraint qualification (LICQ) at $$p \in \mathcal {F}$$ if

Clearly, since holds, implies (ZKRCQ), which in turn implies transversality. If the index $$\ell$$ of g(p) satisfies $$\ell = 0$$, i.e.g(p) is not a corner of positive index, then all above notions are equivalent, because holds in this case.

### Proposition 3.8

If (ZKRCQ) holds, then $$\mathcal {C}_{p}{\mathcal {F}} = \mathcal {L}_{p}({g},{\mathcal {K}})$$.

### Proof

As above, consider a chart $$\varphi$$ of $$\mathcal {M}$$ centered at p and an adapted chart $$\psi$$ of $$\mathcal {N}$$ centered at g(p). Then the feasible set is represented locally as follows:

while the representation of the linearizing cone is:

(3.2)

Then (ZKRCQ) can be written as:

(3.3)

Under assumption (ZKRCQ), we can apply [18] to conclude that $$\mathcal {L}_{p}({g},{\mathcal {K}})_{\psi ,\varphi }$$ coincides with the tangent cone of $$\mathcal {F}_{\psi ,\varphi }$$ at 0 in $$\mathbb {R}^n$$, which is, by Lemma 3.2, a representative of $$\mathcal {C}_{p}{\mathcal {F}}$$. Since both sets are representatives of subsets of , we conclude the result as claimed. $$\square$$

Using the local representation (3.2), where our constraints are split into equality and inequality constraints, we can formulate the Mangasarian–Fromovitz constraint qualification (MFCQ) in the following way:

### Proposition 3.9

(MFCQ) and (ZKRCQ) are equivalent.

### Proof

Let (MFCQ) hold and $$y \in \mathbb {R}^n$$ be arbitrary. Define $${\hat{y}} {:}{=}g_{\psi ,\varphi }'(0) \, {\hat{x}} \in {{\,\mathrm{image}\,}}g_{\psi ,\varphi }'(0)$$. In addition, since $$W g_{\psi ,\varphi }'(0)$$ is surjective, there is $${\tilde{x}}$$, such that $$W g_{\psi ,\varphi }'(0) \, {\tilde{x}} = W y$$ and we define $${\tilde{y}} {:}{=}g_{\psi ,\varphi }'(0) \, {\tilde{x}}$$. Then we can write for any $$\alpha > 0$$:

\begin{aligned} y = (\alpha \, {\hat{y}} + {\tilde{y}}) - (\alpha \, {\hat{y}} + {\tilde{y}} - y), \end{aligned}

where $$\alpha \, {\hat{y}} + {\tilde{y}} \in {{\,\mathrm{image}\,}}g_{\psi ,\varphi }'(0)$$. By construction, $$W(\alpha \, {\hat{y}} + {\tilde{y}} - y) = 0$$ holds, and choosing $$\alpha$$ sufficiently large we also obtain $$A(\alpha \, {\hat{y}} + {\tilde{y}} - y) \le 0$$, because $$A {\hat{y}} < 0$$. This shows (3.3) and thus (ZKRCQ).

If (ZKRCQ) holds, then for any $$y \in \mathbb {R}^n$$ there is $${\hat{y}} \in {{\,\mathrm{image}\,}}g_{\psi ,\varphi }'(0)$$, such that $$W {\hat{y}} = W y$$ and $$A {\hat{y}} \le Ay$$, because $$y = {\hat{y}} - ({\hat{y}} - y)$$ with $$W({\hat{y}} - y) = 0$$ and $$A({\hat{y}} - y) \le 0$$. Thus, since W and A are surjective by definition of manifolds with corners, $$W g_{\psi ,\varphi }'(0)$$ is surjective as well, and we find y such that $$W y = 0$$ and $$A y < 0$$, and thus also $${\hat{y}} = g_{\psi ,\varphi }'(0) \, {\hat{x}}$$ with the same properties. So (MFCQ) holds. $$\square$$

### Proposition 3.10

$$\mathcal {F}$$ satisfies (LICQ) at $$p \in \mathcal {F}$$ if and only if, for every representation in charts, the following linear mapping is surjective:

\begin{aligned} B \, g_{\psi ,\varphi }'(0) :\mathbb {R}^m \rightarrow \mathbb {R}^\ell \times \mathbb {R}^{n-k} , \quad \text {where } B {:}{=}\begin{pmatrix} A \\ W \end{pmatrix} . \end{aligned}

### Proof

Let with representative $$v_\psi \in \mathbb {R}^n$$. If $$B \, g_{\psi ,\varphi }'(0)$$ is surjective, then we find $$w_\varphi \in \mathbb {R}^m$$, such that $$B \, g_{\psi ,\varphi }'(0) \, w_\varphi = -B \, v_\psi$$. This implies that $$v^0_\psi {:}{=}g_{\psi ,\varphi }'(0) \, w_\varphi + v_\psi \in \ker B$$ and we may write $$v_\psi = g_{\psi ,\varphi }'(0) \, w_\varphi - v^0_\psi$$. Thus, we have found and $$v^0 \in \mathcal {T}_{g(p)}^0{\mathcal {K}}$$, such that $$v = g'(p)w-v^0$$.

If, conversely, (LICQ) holds, then we can write $$v_\psi = g_{\psi ,\varphi }'(0) \, w_\varphi - v^0_\psi$$ for any with $$v^0_\psi \in \ker B$$ and thus $$B \, g_{\psi ,\varphi }'(0) \, w_\varphi = B \, v_\psi$$. Hence the surjectivity of $$B \, g_{\psi ,\varphi }'(0)$$ follows from the surjectivity of B, which holds by Definition 2.1 of a submanifold with corners. $$\square$$

## 4 First-Order Optimality Conditions

In this section we address the first-order necessary optimality conditions for (2.1) under the constraint qualification (ZKRCQ). To this end, we recall that

denotes the polar cone of an arbitrary set $$S \subset V$$ of a normed vector space V.

### Theorem 4.1

Suppose that $$p_* \in \mathcal {F}$$ is a local minimizer of (2.1) such that (ZKRCQ) holds at $$p_*$$. Then there exists a Lagrange multiplier such that the following KKT conditions hold:

(4.1a)
(4.1b)

The set of all possible Lagrange multipliers, is compact. If (LICQ) holds, then $$\varLambda (p_*)$$ is a singleton.

### Proof

By Lemma 3.3 we have $$f'(p_*) \ge 0$$ on $$\mathcal {C}_{p_*}{\mathcal {F}}$$ and thus, by Proposition 3.8 on $$\mathcal {L}_{p_*}({g},{\mathcal {K}})$$. Hence $$v = 0$$ is a minimizer of the following linear problem:

Due to the (ZKRCQ) regularity condition, we can once more apply the results of [18] to this problem to conclude the existence of a Lagrange multiplier $$\mu$$ such that the KKT conditions (4.1) hold, so $$\varLambda (p_*)$$ is non-empty. Being the intersection of closed sets, $$\varLambda (p_*)$$ is also closed.

In order to prove the boundedness of $$\varLambda (p_*)$$, we proceed by contradiction. Consider a sequence $$\mu _k$$ of Lagrange multipliers with and a corresponding bounded sequence with . By picking a subsequence we may assume that $$\lambda _k$$ converges to a limit $$\lambda _*$$ with . Due to (ZKRCQ), every can be written as $$v = w-u$$, where $$w \in {{\,\mathrm{image}\,}}g'(p_*)$$ and $$u \in \mathcal {T}_{g(p_*)}^i{\mathcal {K}}$$. Then we compute

Since $$(\mu _k - \mu _1) \, w = 0$$, $$\mu _k \, w \le 0$$, and the last addend in the sum tends to 0, as $$k \rightarrow \infty$$, it follows that $$\lambda _* \, v \ge 0$$ holds for all and thus $$\lambda _* = 0$$, which is in contradiction to . Hence, $$\varLambda (p_*)$$ is bounded and therefore compact.

Now consider two solutions $$\mu _1$$ and $$\mu _2$$ of (4.1). Then and $$(\mu _1 - \mu _2) \, g'(p_*) = 0$$. Hence, for all $$v \in {{\,\mathrm{image}\,}}g'(p_*) - \mathcal {T}_{g(p_*)}^0{\mathcal {K}}$$, it follows that $$(\mu _1 - \mu _2) \, v = 0$$. If (LICQ) holds, then this implies $$(\mu _1 - \mu _2) \, v = 0$$ for all and thus $$\mu _1 = \mu _2$$. $$\square$$

In the following, we derive a representation $$\mu _\psi \in \mathbb {R}^n$$ of with respect to an adapted local chart $$\psi$$ centered at g(p). Recall that, by definition, $$v \in \mathcal {T}_{g(p)}^i{\mathcal {K}}$$ holds if and only if $$W v_\psi = 0$$ and $$A \, v_\psi \le 0$$.

### Proposition 4.2

holds if and only if its representation $$\mu _\psi$$ in an adapted chart is of the following form:

where $$\lambda _I \ge 0 \in \mathbb {R}^\ell$$ and $$\lambda _E \in \mathbb {R}^{n-k}$$. Hence, in local charts, (4.1) reads:

### Proof

Consider a representative $$v_\psi$$ of an element of $$\mathcal {T}_{g(p)}^i{\mathcal {K}}$$ and $$\mu _\psi$$ of the claimed form:

Hence, .

For the converse, assume that $$(\lambda _I)_i < 0$$ for some $$1\le i \le \ell$$. Since A is surjective, choose $$v_\psi$$ such that $$A \, v_\psi = -e_i$$ holds, which implies , so . $$\square$$

We return back to Example 2.3 and recall that the rows of $${\widehat{A}} \in \mathbb {R}^{\ell \times k}$$ consist of those unit vectors for which $$g_I(x_*)_i = 0$$ holds. We observe the representation

Thus we obtain the classical complementarity result:

\begin{aligned} \eta _I \ge 0 , \quad g_I(x_*) \le 0 , \quad \eta _I \, g_I(x_*) = 0 , \end{aligned}

together with the well-known dual equation:

After transposition, it takes the more familiar form

## 5 Retractions and Linearizing Maps

Numerical solution algorithms frequently employ retractions to pull back optimization problems on manifolds to the corresponding tangent spaces. In this section we will consider reformulations of the KKT conditions (4.1) in terms of these objects. This is an alternative to our approach via local charts employed in Sect. 4 and it allows us to argue more conveniently in some cases. Moreover, retractions are also the approach we take for the second-order analysis in Sect. 8.

We will use the following definitions:

### Definition 5.1

Let be a neighborhood of . A $$C^2$$-mapping is called a local retraction at p if it satisfies:

(Ri):

,

(Rii):

.

Let $$\mathcal {U}_{q} \subset \mathcal {N}$$ be a neighborhood of $$q \in \mathcal {N}$$. A $$C^2$$-mapping is called a local linearizing map at q if it satisfies:

(Si):

$$S_q(q) = 0_q$$,

(Sii):

.

We call $$S_q$$ adapted to $$\mathcal {K}$$ if $$S_q(\mathcal {U}_q \cap \mathcal {K}) = S_q(\mathcal {U}_q) \cap \mathcal {T}_{q}^i{\mathcal {K}}$$ holds.

Every chart $$\varphi$$ on $$\mathcal {M}$$, centered at p, induces a local retraction at p via . Moreover, every adapted chart $$\psi$$ on $$\mathcal {N}$$, centered at q, induces an adapted linearizing map: for any $$\eta \in \mathcal {U}_q$$ we define by the equivalence class of $$v_\psi {:}{=}\psi (\eta )$$. If $$\mathcal {K}$$ is a geodesic polyhedron on a Riemannian manifold $$\mathcal {N}$$ as in Example 2.4, then yields an adapted linearizing map at q.

### Remark 5.2

Retractions are widely used in optimization algorithms on manifolds; see, e. g., [1]. Linearizing maps for constrained problems were introduced in [14], but a similar concept has been used in a different context in [3] under the name “generalized logarithmic map”.

The concept of adapted linearizing maps may be useful for the implementation of numerical algorithms in this setting. As we will see below, it allows us to write down a local optimization problem at $$p_*$$ in a way that resembles a classical formulation without the need of further linearization of $$S_{g(p_*)}(\mathcal {U}_{g(p_*)} \cap \mathcal {K})$$.

Let p be a feasible point of (2.1) and and be a given local retraction and adapted linearizing map, respectively. Choosing their domain of definition sufficiently small, we may assume without loss of generality that and $$S_{g(p)}$$ are injective with . We can now locally pull back our problem as follows:

and formulate a local optimization problem on the tangent space at p:

(5.1)

since $${\varvec{K}}$$ is the intersection of a polyhedral convex cone and a neighborhood of $$0_{g(p)}$$. It can thus be described by finitely many linear equality and inequality constraints on . Neglecting the local neighborhoods, (5.1) is locally a classical constrained optimization problem of the form:

(5.2)

with linear mappings and . Notice that the data of problem (5.2) is, of course, not uniquely defined. For instance, we may premultiply $$A_I$$ by a positive diagonal matrix, and $$A_E$$ by any invertible matrix. However, the viable choices for $$A_I$$ and $$A_E$$ do not depend on the choice of $$S_{g(p)}$$.

### Theorem 5.3

Suppose that $$p_*$$ is a feasible point of (2.1). Then $$p_*$$ is locally optimal for (2.1) if and only if is a local minimizer of (5.1). In this case, when (ZKRCQ) holds at $$p_*$$, then there exists such that

### Proof

Clearly, is a local minimizer of (5.1) if and only if $$p_*$$ is a local minimizer of (2.1). Moreover, by the chain rule, using property (Rii) of and property (Sii) of $$S_{g(p_*)}$$:

\begin{aligned} {\varvec{f}}'(0_{p_*}) = f'(p_*) , \quad {\varvec{g}}'(0_{p_*}) = g'(p_*) , \quad \mathcal {C}_{0_{p_*}}{{\varvec{K}}} = \mathcal {T}_{g(p_*)}^i{\mathcal {K}} . \end{aligned}

Thus, our conditions directly follow from (4.1). $$\square$$

As an alternative approach, we can apply a classical theorem on KKT conditions to (5.2) and obtain

(5.3)

with $$\lambda _I \in \mathbb {R}^\ell$$ and $$\lambda _E \in \mathbb {R}^{n-k}$$, which depend on the choice of $$A_I$$ and $$A_E$$. By invariance, the first row equivalently yields:

and thus by comparison,

We emphasize that the number of rows in $$A_I$$, which is equal to the index $$\ell$$ of the corner $$g(p_*)$$, depends on $$g(p_*)$$. Thus, there is no further distinction necessary between active and inactive constraints, because this is already built into the local representation of $$\mathcal {K}$$.

The formulation (5.3) allows us to split the given constraints into individual components and to distinguish strongly active and weakly active constraints, according to the structure of $$\lambda _I$$.

### Definition 5.4

We call the i-th constraint $$(A_I)_i \, {\varvec{g}}\le 0$$ weakly active at $$(p_*,\lambda _I,\lambda _E)$$ if $$(\lambda _I)_i = 0$$ holds, and strongly active in case $$(\lambda _I)_i > 0$$.

Observe that this definition does not depend on the particular choice of $$A_I$$. If $$A_I$$ is premultiplied by a positive diagonal matrix, then the notion of weak and strong activity of $$(A_I)_i$$ is not changed.

## 6 Lagrangian Functions

When $$\mathcal {N}= V$$ is a normed linear space with dual space $$V^*$$ and $$g :\mathcal {M}\rightarrow V$$, then a Lagrangian function for our problem (2.1) with Lagrange multiplier $$\mu \in V^*$$ can be defined as usual:

\begin{aligned} L :\mathcal {M}\times V^* \ni (p,\mu ) \mapsto L(p,\mu ) {:}{=}f(p) + \mu (g(p)) \in \mathbb {R}. \end{aligned}

However when $$\mathcal {N}$$ is a nonlinear manifold, then $$\mu$$ cannot be defined as a linear functional on $$\mathcal {N}$$. Rather, we need to replace it with a function $$h \in C^1(\mathcal {N},\mathbb {R})$$ and define

\begin{aligned} L :\mathcal {M}\times C^1(\mathcal {N},\mathbb {R}) \ni (p,h) \mapsto L(p,h) {:}{=}f(p) + h(g(p)) \in \mathbb {R}\end{aligned}

as a Lagrangian function. In the following we will consider h fixed and regard the mapping $$p \mapsto L(p,h):\mathcal {M}\rightarrow \mathbb {R}$$ as a function in p. Its derivative $$L'$$ is given by

\begin{aligned} L'(p,h) {:}{=}\frac{\mathop {}\!d }{\mathop {}\!d p} L(p,h) = f'(p) + h'(g(p)) \, g'(p) . \end{aligned}

For these derivatives to be well-defined at a point p, it is enough that h is defined in some neighborhood of p. We can observe two things. First, can be interpreted as a Lagrange multiplier; second, $$L'(p,h)$$ only depends on $$\mu = h'(g(p))$$ and not on the particular choice of h.

The paragraph above explains how to obtain $$\mu$$ from h. Conversely, let $$p_* \in \mathcal {M}$$ be fixed and $$q_* = g(p_*)$$. In view of the KKT-conditions (4.1) we would like to extend a Lagrange multiplier locally to a nonlinear function h on a neighbourhood of $$q_*$$ such that $$h'(q_*) = \mu$$ holds. This can be achieved by using a linearizing map $$S_{q_*}$$ about $$q_*$$ and defining $$h {:}{=}\mu \circ S_{q_*}$$. Then we obtain a Lagrangian function of the form

\begin{aligned} L_{S_{q_*}}(p,\mu ) {:}{=}L(p,\mu \circ S_{q_*}) = f(p) + \mu \circ S_{q_*} \circ g(p) . \end{aligned}

Since $$h'(q_*) = \mu \circ DS_{q_*}(q_*) = \mu$$, we obtain with this definition of h:

\begin{aligned} L'_{S_{q_*}}(p_*,\mu ) = f'(p_*) + \mu \, g'(p_*) = L'(p_*,h) . \end{aligned}
(6.1)

Alternatively we may define Lagrangian functions near $$p_*$$ with $$q_* = g(p_*)$$ via pull-backs:

with derivative

\begin{aligned} {\varvec{L}}'(v,\mu ) = {\varvec{f}}'(v) + \mu \, {\varvec{g}}'(v) \quad \text {and thus} \quad {\varvec{L}}'(0_{p_*},\mu ) = f'(p_*) + \mu \, g'(p_*). \end{aligned}

It is therefore justified to define the derivative of the Lagrangian function in the following way:

\begin{aligned} \begin{aligned} L'(p_*,\mu ) {:}{=}f'(p_*) + \mu \, g'(p_*) = {\varvec{L}}'(0_{p_*},\mu ) = L'_{S_{q_*}}(p_*,\mu ) = L'(p_*,h) \\ \text {for } \mu = h'(q_*) , \end{aligned} \end{aligned}
(6.2)

independently of the choice of the retraction , linearizing map $$S_{q_*}$$, and h, as long as $$\mu = h'(q_*)$$. Utilizing the identifications $$\mu {:}{=}h'(g(p_*))$$ and $$h {:}{=}\mu \circ S_{g(p_*)}$$, we find that the KKT conditions (4.1) can equivalently be written in the familiar way:

(6.3a)
(6.3b)

## 7 The Critical Cone

To derive second-order optimality conditions, we need a definition of the critical cone at a KKT point $$p_*$$ as a subset of the tangent cone $$\mathcal {C}_{p_*}{\mathcal {F}}$$. Suppose that $$(p_*,\mu )$$ satisfies the KKT conditions (4.1). We define the critical cone at $$p_*$$ as

We also introduce the definition

where $$(A_I)_j$$ are the components of the mapping used in (5.2). Then we can write for any Lagrange multiplier $$\mu \in \varLambda (p_*)$$.

The following considerations will be useful for the discussion of second-order conditions:

### Lemma 7.1

Suppose that X is a normed linear space and U, V are open neighborhoods of $$0 \in X$$. Consider a diffeomorphism $$\varPhi :U \rightarrow V$$ such that $$\varPhi (0) = 0$$ and $$\varPhi '(0) = id _X$$ hold. Let K be a polyhedral cone of the form

with linear maps $$A_I :X \rightarrow \mathbb {R}^{n_I}$$ and $$A_E :X \rightarrow \mathbb {R}^{n_E}$$. Suppose that

\begin{aligned} \varPhi :K \cap U \rightarrow K \cap V \end{aligned}

is bijective. Select a row $$a_j = (A_I)_j$$ and define the facet

Then there are neighborhoods $${\tilde{U}}$$ and $${\tilde{V}}$$ of 0 such that

\begin{aligned} \varPhi :K_j \cap {\tilde{U}} \rightarrow K_j \cap {\tilde{V}} \end{aligned}

is also bijective.

### Proof

We may assume w.l.o.g. that $${\tilde{U}} = U = B_r(0)$$ is an open ball of radius r about 0. Since $$\varPhi$$ is a homeomorphism and thus preserves boundaries of sets, we conclude in particular that

\begin{aligned} \varPhi :\partial K \cap U \rightarrow \partial K \cap V \end{aligned}

is also a homeomorphism. Consider now the “open” facet

which is a relatively open subset of $$\partial K$$. Then $$U \cap {\tilde{K}}_j$$ is a connected set, because U and $${\tilde{K}}_j$$ are both connected and convex. The continuity of $$\varPhi$$ implies that $$\varPhi (U \cap {\tilde{K}}_j)$$ is connected as well. However, the arbitrary union of two (or more) distinct open facets is not connected because each $${\tilde{K}}_j$$ is a relatively open subset of this union. Hence, $$\varPhi (U \cap {\tilde{K}}_j)$$ is a subset of an open facet $${\tilde{K}}_\ell$$ and it remains to show $$j = \ell$$. Since $$\varPhi '(0) = id _X$$ holds, we find that

\begin{aligned} \varPhi '(0) :{\tilde{K}}_j \rightarrow {\tilde{K}}_j \end{aligned}

is bijective. Using the differentiability of $$\varPhi$$ this implies that there exists $$x_0 \in {\tilde{K}}_j$$ such that $$\varPhi (x_0) \in {\tilde{K}}_j$$ holds. We thus conclude that $$\varPhi (U \cap {\tilde{K}}_j) \subset {\tilde{K}}_j$$.

Picking some $$B_\rho (0) \subset V$$ we can show by the same argumentation

\begin{aligned} \varPhi ^{-1}(B_\rho (0) \cap {\tilde{K}}_j) \subset {\tilde{K}}_j \cap U \end{aligned}

and thus $$B_\rho (0) \cap {\tilde{K}}_j \subset \varPhi ({\tilde{K}}_j \cap U)$$. Thus, $$\varPhi (U \cap {\tilde{K}}_j)$$ can be written as $${\tilde{K}}_j \cap {\tilde{V}}$$, where $${\tilde{V}}$$ is a neighborhood of 0. $$\square$$

This lemma can be applied recursively also to subfacets of K. Hence, after finitely many steps of application, we conclude in particular that there are neighborhoods U and V of 0 such that $$\varPhi$$ maps $$\mathcal {C}^{\mathrm {crit}}_{\mathcal {N}} \cap U$$ bijectively onto $$\mathcal {C}^{\mathrm {crit}}_{\mathcal {N}} \cap V$$.

### Lemma 7.2

Consider two adapted linearizing maps $$S_{q,1}$$ and $$S_{q,2}$$ and the transition map $$\varTheta {:}{=}S_{q,1} \circ S_{q,2}^{-1}$$. Then

### Proof

Consider any cone such that $$\varTheta$$ maps K into K. Since $$\varTheta (0_q) = 0_q$$ and $$\varTheta '(0_q) = id _{\mathbb {R}^n}$$ hold, we can compute

Since both $$\varTheta (t \, v)$$ and $$t \, v$$ belong to K, $$\varTheta (t \, v) - t \, v$$ belongs to $${{\,\mathrm{span}\,}}K$$ and thus so does the limit. By definition, $$\varTheta$$ maps $${\varvec{K}}\subset \mathcal {T}_{q}^i{\mathcal {K}}$$ into $$\mathcal {T}_{q}^i{\mathcal {K}}$$ and thus for $$v \in \mathcal {T}_{q}^i{\mathcal {K}}$$, proving our first assertion. Our second assertion follows similarly, because $$\varTheta$$ maps $$\mathcal {C}^{\mathrm {crit}}_{\mathcal {N}}$$ into $$\mathcal {C}^{\mathrm {crit}}_{\mathcal {N}}$$ by Lemma 7.1. $$\square$$

## 8 Second-Order Optimality Conditions

Compared to the case in vector spaces, the formulation of second-order conditions on manifolds exhibits an additional difficulty. On a vector space V the second derivative of a real-valued function $$\sigma :V \rightarrow \mathbb {R}$$ at $$x \in V$$ can be represented as a bilinear form $$\sigma ''(x) :V \times V \rightarrow \mathbb {R}$$, whose definiteness properties can be studied. In contrast, for $$\sigma :\mathcal {M}\rightarrow \mathbb {R}$$ we have and thus . The required representation of $$\sigma ''(p)$$ as a bilinear form on , i.e., is not given canonically. A connection, or equivalently, a covariant derivative, has to be specified for this purpose. However, at a stationary point $$p_*\in \mathcal {M}$$, i.e.$$\sigma '(p_*) = 0$$, second derivatives of scalar-valued functions can be represented canonically by bilinear forms on without the help of a covariant derivative, as shown in the following lemma.

### Lemma 8.1

Suppose that $$\sigma \in C^2(\mathcal {M},\mathbb {R})$$. At a point $$p_* \in \mathcal {M}$$ satisfying $$\sigma '(p_*) = 0$$, the second derivative is a well-defined symmetric bilinear form, i.e., a symmetric (2, 0)-tensor.

### Proof

Consider two charts $${\varphi _1}$$ and $${\varphi _2}$$ centered at $$p_*$$ so that $$\varphi _1(p_*) = \varphi _2(p_*) = 0$$ holds. Then $$\sigma$$ has representations $$\sigma _{\varphi _1} {:}{=}\sigma \circ {\varphi _1}^{-1}$$ and $$\sigma _{\varphi _2} = \sigma \circ {\varphi _2}^{-1}$$ in charts, and $$\sigma _{\varphi _1} = \sigma _{\varphi _2} \circ T$$ with $$T = \varphi _2 \circ \varphi _1^{-1}$$. Let $$v_{\varphi _1}$$ and $$v_{\varphi _2}$$ be the representatives of . Then $$v_{\varphi _2} = T'(0) \, v_{\varphi _1}$$ holds and we have

\begin{aligned} \sigma _{\varphi _1}'(0) \, v_{\varphi _1} = \sigma _{\varphi _2}'(0) \, T'(0) \, v_{\varphi _1} = \sigma _{\varphi _2}'(0) \, v_{\varphi _2} . \end{aligned}

Using $$\sigma '(p_*) = 0$$ we find

\begin{aligned} \sigma _{\varphi _1}''(0)[v_{\varphi _1}, v_{\varphi _1}]&= \sigma _{\varphi _2}''(0)[T'(0) \, v_{\varphi _1}, T'(0) \, v_{\varphi _1}] + \sigma _{\varphi _2}'(0) \, T''(0)[v_{\varphi _1}, v_{\varphi _1}] \\&= \sigma _{\varphi _2}''(0)[T'(0) \, v_{\varphi _1}, T'(0) \, v_{\varphi _1}] . \end{aligned}

This implies the well-definedness of $$\sigma ''(p_*)$$ on . Its symmetry follows from the theorem of Schwarz. $$\square$$

As a consequence of Lemma 8.1, second-order optimality conditions for unconstrained optimization problems on $$C^2$$-manifolds can be formulated without recourse to covariant derivatives. Even for constrained problems for which the constraint target manifold $$\mathcal {N}= V$$ is a linear space, we can apply Lemma 8.1 to the Lagrangian function $$L :\mathcal {M}\times V^* \rightarrow \mathbb {R}$$, i.e.$$\sigma (p) {:}{=}L(p,\mu )$$, at a KKT point $$p_*$$ with Lagrange multiplier $$\mu \in V^*$$ and obtain a well-defined second derivative , because of $$\sigma (p_*) = L'(p_*,\mu ) = 0$$.

For the general case of manifold-valued constraints, the situation is more complex, since, as we have seen, a classical Lagrange multiplier $$\mu$$ cannot be used directly to define a Lagrangian function due to lack of linearity of $$\mathcal {N}$$. Instead, a nonlinear function $$h \in C^2(\mathcal {N},\mathbb {R})$$ was used to define L(ph). Although $$L'(p,h)$$ only depends on $$\mu {:}{=}h'(g(p))$$, the situation is different for the second-order derivative. Let $$p_*$$ be a KKT-point, $$q_* = g(p_*)$$, and the corresponding Lagrange multiplier such that $$h'(q_*) = \mu$$ and $$L'(p_*,h) = 0$$ hold. Then we can apply Lemma 8.1 to $$\sigma (p) {:}{=}L(p,h) = f(p) + h(g(p))$$ and obtain a well-defined bilinear form at $$p_*$$:

Unfortunately, $$L''(p_*,h)$$ still depends on the particular choice of h and not only on $$\mu =h'(q_*)$$. This can be seen most clearly when $$\mathcal {M}$$ and $$\mathcal {N}$$ are linear spaces. Then we can compute $$L''(p_*,h)$$ as follows:

\begin{aligned} L''(p_*,h)[v,v] = f''(p_*)[v,v] + \mu \, g''(p_*)[v,v] + h''(q_*)[g'(p_*) \, v, g'(p_*) \, v] , \end{aligned}

and we observe that the third term on the right hand side depends on the second derivative of h. Of course, these second derivatives can be avoided when $$\mathcal {N}$$ is a linear space by taking the canonical choice $$h = \mu$$, but such a canonical choice is not possible when $$\mathcal {N}$$ is nonlinear.

However, suppose we use an adapted linearizing map $$S_{q_*}$$ about $$q_*$$ to define $$h = \mu \circ S_{q_*}$$ and thus $$L_{S_{q_*}}(p,\mu ) = L(p,h)$$ holds. In that case, as we will show now, $$L''(p_*, h)[v, v] = L_{S_{q_*}}''(p_*,\mu )[v,v]$$ is independent of the particular choice of $$S_{q_*}$$ on the critical cone, i.e., for $$v \in \mathcal {C}^{\mathrm {crit}}_{\mathcal {M}}$$. This is all we need in order to formulate second-order optimality conditions in an invariant way.

### Proposition 8.2

Suppose that $$p_*$$ is a KKT point, $$q_* = g(p_*)$$ holds and is a corresponding Lagrange multiplier so that (6.3) is satisfied. Let $$S_{q_*,1}$$ and $$S_{q_*,2}$$ be adapted linearizing maps about $$q_*$$. Then

\begin{aligned} L_{S_{q_*,1}}''(p_*,\mu )[v, v] = L_{S_{q_*,2}}''(p_*,\mu )[v, v] \quad \text {for all } v \in \mathcal {C}^{\mathrm {crit}}_{\mathcal {M}} . \end{aligned}
(8.1)

In view of (6.2), we therefore also refer to $$L_{S_{q_*,i}}''(p_*,h_i)$$ simply as $$L''(p_*,\mu )$$. Moreover, for any pullback with retraction and adapted linearizing map $$S_{q_*}$$, the relation

\begin{aligned} L''(p_*,\mu )[v, v] = {\varvec{L}}''(0_{p_*},\mu )[v, v] \quad \text {for all } v \in \mathcal {C}^{\mathrm {crit}}_{\mathcal {M}} \end{aligned}

holds.

### Proof

Defining $$\varTheta {:}{=}S_{q_*,1} \circ S_{q_*,2}^{-1}$$, we observe $$\mu \circ S_{q_*,2} = \mu \circ \varTheta \circ S_{q_*,1}$$. Consequently, for and , we have

Since $$p_*$$ is stationary, second derivatives of $$L_{S_{q_*}}(p,\mu )$$ are well-defined and can be computed as follows, using the fact that holds:

For $$v \in \mathcal {C}^{\mathrm {crit}}_{\mathcal {M}}$$ we conclude $${\varvec{g}}'(0_{p_*}) \, v \in \mathcal {C}^{\mathrm {crit}}_{\mathcal {N}}$$ and thus we find by Lemma 7.2, using that the linearizing maps are adapted:

\begin{aligned} \varTheta ''(0_{q_*})[{\varvec{g}}'(0_{p_*}) \, v, {\varvec{g}}'(0_{p_*}) \, v] \in {{\,\mathrm{span}\,}}\mathcal {C}^{\mathrm {crit}}_{\mathcal {N}} . \end{aligned}

By stationarity and by definition of $$\mathcal {C}^{\mathrm {crit}}_{\mathcal {N}}$$, we infer and thus

\begin{aligned} \mu \circ \varTheta ''(0_{q_*})[{\varvec{g}}'(0_{p_*}) \, v, {\varvec{g}}'(0_{p_*}) \, v] = 0 \quad \text {for all } v \in \mathcal {C}^{\mathrm {crit}}_{\mathcal {M}} , \end{aligned}
(8.2)

which yields the desired result. $$\square$$

### Remark 8.3

The conclusion of Proposition 8.2 can be extended slightly beyond the class of adapted linearizing maps: let us call $$S_{q_*,1}$$ and $$S_{q_*,2}$$ second-order consistent, if their transition map $$\varTheta {:}{=}S_{q_*,1} \circ S_{q_*,2}^{-1}$$ satisfies $$\varTheta ''(0_{q_*}) = 0$$. Clearly, (8.2) holds for second-order consistent linearizing maps, even for all . Hence, (8.1) extends to linearizing maps each of which is second-order consistent with some adapted linearizing map.

### Remark 8.4

The restriction to adapted linearizing maps in Proposition 8.2 is natural, taking into account the definition of a manifold with corners via adapted local charts. To illustrate that this restriction is also essential (up to Remark 8.3), consider $$\mathcal {M}= \mathcal {N}= \mathbb {R}^2$$ with , $$f(p) = -p_1$$, $$g = id _\mathcal {M}$$ and . Then 0 is a local minimizer of f, hold, and

\begin{aligned} 0 = L'(0,\mu ) \, v = \begin{pmatrix} -v_1 \\ 0 \end{pmatrix} + \mu \, \begin{pmatrix} v_1 \\ v_2 \end{pmatrix} \quad \text {implies} \quad \mu = \begin{pmatrix} 1 \\ 0\end{pmatrix} . \end{aligned}

Using the adapted linearizing map $$S_{0,1} = id _\mathcal {M}$$, we obtain $$\mu \circ S_{0,1}(p) = p_1$$ and $$L_{S_{0,1}}''(0,\mu )[v, v] = 0$$, but using the non-adapted linearizing map $$S_{0,2}(p) {:}{=}(p_1 + \alpha \, p_2^2, p_2)$$ would yield $$\mu \circ S_{0,2}(p) = p_1 + \alpha \, p_2^2$$ and $$L_{0,2}''(0,\mu )[v, v] = 2 \, \alpha \, v_1^2$$.

In general, it is also not possible to extend (8.1) beyond $$v \in \mathcal {C}^{\mathrm {crit}}_{\mathcal {M}}$$. Using the adapted local linearizing map (when , then $$p_1 + p_1 p_2 \ge 0 \Leftrightarrow p_1 \ge 0$$), we obtain $$\mu \circ S_{0,3}(p) = p_1 + p_1 \, p_2$$ and thus $$L_{S_{0,3}}''(0,\mu )[v, v] = 2 \, v_1 \, v_2$$, which coincides with $$L_{S_{0,1}}''(0,\mu )[v, v] = 0$$ on $$\mathcal {C}^{\mathrm {crit}}_{\mathcal {M}}$$ but not on all of $$\mathbb {R}^2$$.

Having achieved an invariant definition of $$L''$$ on the critical cone, second-order optimality conditions for manifold-valued constraints can now be reduced to the classical vector-valued case. Suppose that $$p_* \in \mathcal {M}$$ is a KKT point with Lagrange multiplier $$\mu$$. For any choice of retraction at $$p_*$$ and adapted linearizing map at $$g(p_*)$$, we consider the second derivative of the pullback $${\varvec{L}}''(0_{p_*},\mu )$$, which—as we have seen—is invariant on the critical cone $$\mathcal {C}^{\mathrm {crit}}_{\mathcal {M}}$$.

Invoking well-known results from the literature, we obtain the following second-order sufficient optimality conditions:

### Theorem 8.5

Assume that $$p_* \in \mathcal {M}$$ and satisfy the KKT conditions (4.1). Moreover, suppose that

\begin{aligned} L''(p_*,\mu )[v, v] > 0 \quad \text {holds for all } v \in \mathcal {C}^{\mathrm {crit}}_{\mathcal {M}} \setminus \{0_{p_*}\} . \end{aligned}

Then $$p_*$$ is a strict local minimizer of problem (2.1).

### Proof

It is clear that this result holds for $${\varvec{L}}''(0_{p_*},\mu )$$ and thus, by invariance, it also holds for $$L''(p_*,\mu )$$; see, e. g., [11, Thm. 12.6]. $$\square$$

Concerning second-order necessary optimality conditions, a wide variety of constraint qualifications can be found in the literature (cf., e. g., [4] and references therein), leading to second-order conditions of various strength. We restrict our discussion here to the simplest case:

### Theorem 8.6

Assume that $$p_* \in \mathcal {M}$$ is a local minimizer of problem (2.1) and that (LICQ) holds at $$p_*$$. Then $$p_*$$ satisfies the KKT conditions (4.1) with some Lagrange multiplier . Moreover,

\begin{aligned} L''(p_*,\mu )[v, v] \ge 0 \quad \text {holds for all } v \in \mathcal {C}^{\mathrm {crit}}_{\mathcal {M}} . \end{aligned}

### Proof

It is clear that this result holds for $${\varvec{L}}''(0_{p_*},\mu )$$ and thus, by invariance, it also holds for $$L''(p_*,\mu )$$; see, e. g., [11, Thm. 12.5]. $$\square$$

## 9 Application to the Control of Discretized Variational Problems

Suppose that $$\mathcal {Y}$$ and $$\mathcal {U}$$ are smooth manifolds and consider the following energy minimization problem, parametrized (or controlled) by u:

\begin{aligned} \text {Minimize} \quad E(y,u) , \quad \text {where } y \in \mathcal {Y}, \end{aligned}

which we replace by its stationarity condition:

Such a situation occurs frequently in the infinite-dimensional context of variational problems, where occasionally $$\mathcal {Y}$$ and/or $$\mathcal {U}$$ are nonlinear, smooth manifolds. Also the principle of stationary action, which is applied, e. g., in classical mechanics, leads to problems of a similar form. After discretization, a similar problem on finite-dimensional manifolds is obtained.

Using the control variable u, an optimal control problem or a parameter identification problem may then be formulated as follows:

\begin{aligned} \begin{aligned} \text {Minimize} \quad&f(y,u) , \quad \text {where } (y,u) \in \mathcal {Y}\times \mathcal {U}\\ \text {s. t.} \quad&0^*_y = c(y,u) . \end{aligned} \end{aligned}

A simple concrete example, which has been considered in e. g., [13, Ch. 6], is the optimal control of a static inextensible flexible rod. Here $$y : [0,1]\rightarrow \mathbb {R}^3$$ is the configuration of the rod, u is an applied force, and E(yu) is the total energy of the rod. Inextensibility is modelled by requiring $$y'(t) \in {\mathbb {S}}^2$$ for all $$t \in [0,1]$$, the unit sphere in $$\mathbb {R}^3$$, which renders $$\mathcal {Y}$$ a nonlinear manifold. An appropriate objective function f may comprise the distance of y to some desired configuration and a Tychonov term for u. For details we refer to [13, Ch. 6] and [14].

Setting $$p {:}{=}(y,u)$$, $$\mathcal {M}{:}{=}\mathcal {Y}\times \mathcal {U}$$, , and taking $$\mathcal {K}$$ to be the zero-section of , i.e., the pairs , which can be identified with $$\mathcal {K}= \mathcal {Y}$$, we observe that this problem fits into our theoretical framework, where the constraint mapping is defined as follows:

To formulate first-order optimality conditions, we calculate the derivative at a feasible point:

At $$0_y$$ we can utilize the canonical splitting (a connection or covariant derivative is not required here) of the cotangent’s tangent space

into the tangent space of the base manifold and a fibre. This allows us to write $$g'(p)$$ as a pair:

and the tangent space of $$\mathcal {K}$$ as:

Thus the linearized constraints can be split into two parts, the first of which is redundant:

Constraint qualifications are fulfilled at p, provided that holds. This is the case if and only if is surjective.

A Lagrange multiplier $$\mu$$ is an element of

where the last identity is the canonical identification of the bidual space with the primal space. A Lagrange multiplier thus is a pair

These splittings yield and thus the KKT-conditions read

Since $$c(y,u) = \partial _y E(y,u)$$ is a linear form on , $$c'(y,u)$$ can be interpreted as a bilinear form on and we have (notice that $$\partial _{yy} E(y,u)$$ is well-defined by Lemma 8.1, since $$\partial _yE(y,u) = 0$$ holds):

\begin{aligned} \lambda \, c'(y,u)(\delta y,\delta u) = (\partial _{y} E)'(y,u)(\lambda ,\delta y,\delta u) = \partial _{yy} E(y,u)(\lambda ,\delta y) +\partial _{yu} E(y,u)(\lambda ,\delta u). \end{aligned}

Then the KKT conditions read in more detail:

To write down a Lagrangian function and second-order conditions, we need adapted linearizing maps on the zero section of at a KKT-point $$p_* = (y_*,u_*)$$ with $$q_* = g(p_*) = (y_*,0_{y_*}^*)$$. Utilizing the above splitting, these are those mappings which map the zero section $$\mathcal {K}= \mathcal {Y}$$ to the first factor of the product, i.e.$$0_\eta \mapsto (\delta y(\eta ),0_y)$$. For a specific example, consider a $$C^2$$-retraction with derivative . Then an adapted linearizing map can be given as:

Since holds, it follows that , and $$S_{q_*} (y,0^*_y) = (v,0^*_{y_*})$$, as required. With the help of this linearizing map, the Lagrange multiplier $$\mu$$ can be extended locally to a function as follows:

and thus the Lagrangian function near $$p_*$$ reads:

Its first derivative at a feasible point, where $$\partial _y E(y,u) = 0$$ holds, is given by

For a the KKT point $$p_*$$ we observe $$L'_{S_{q_*}}(p_*,\mu ) = 0$$, since .

Since $$\mathcal {T}_{(y,0_y^*)}^i{\mathcal {K}}$$ is a linear subspace in our setting, the critical cone $$\mathcal {C}^{\mathrm {crit}}_{\mathcal {M}}$$ is given as the preimage of under $$g'(p_*)$$, so it is the set

Finally, the second derivative of the Lagrangian at $$p_*$$ is well-defined on $$\mathcal {C}^{\mathrm {crit}}_{\mathcal {M}}$$ and can, at least formally, be written as:

\begin{aligned} \begin{aligned} L_{S_{q_*}}''(y_*,u_*,\lambda )[\delta p, \delta p] = (f''(y_*,u_*)+ (\partial _yE)''(y_*,u_*)(\lambda ))[(\delta y,\delta u),(\delta y,\delta u)] \\ \text {for all } (\delta y,\delta u) \in \mathcal {C}^{\mathrm {crit}}_{\mathcal {M}}. \end{aligned} \end{aligned}

As a consequence of the restriction $$(\delta y,\delta u) \in \mathcal {C}^{\mathrm {crit}}_{\mathcal {M}}$$ and the fact that $$S_{q_*}$$ is adapted, terms containing are not present in this formula, which reflects Proposition 8.2.

## 10 Conclusion and Outlook

In this paper we have extended the analysis of optimization problems on manifolds from vector space-valued constraints to the much more flexible case of manifold-valued constraints. We have seen that such problems arise naturally when constraints are formulated in a geometric way, and in the optimal control of variational problems on manifolds. We generalized the polyhedric structure required for inequality constraints by using submanifolds with corners and adapted local charts.

First-order optimality conditions were derived, which directly generalize the known cases. An appropriate definition of the Lagrangian function and the formulation of well-defined second-order optimality conditions, however, revealed the significance of the above-mentioned polyhedric structure, reflected by the important role played by adapted linearizing maps. We emphasize that in order to derive the theory, Riemannian metrics or connections were not needed.

Most of the stated results may be generalized to infinite-dimensional Banach manifolds. However, we expect additional technical difficulties. First, it seems to be an open problem how to generalize Definition 2.1 to the infinite dimensional case, i.e., to define corners of infinite index $$\ell = \infty$$ in a useful way. Second, already in infinite-dimensional Banach spaces, optimality conditions exhibit a couple of topological subtleties, which have to be tackled in the case of Banach manifolds, as well.

Further, algorithmic approaches for this class of optimization problems are still to be developed, even in the finite-dimensional setting. An idea would be to extend SQP methods to this setting. At every iterate $$x_k$$ we perform a local pull-back of the given problem to tangent spaces, using retractions and adapted linearizing maps. Locally, we end up with a problem of the form (5.2). A QP step may then be computed for this pull-back, and an update can be defined via a retraction. A detailed realization of this basic idea is, however, subject to future research.