1 Introduction

Shape optimization is of great importance in a wide range of applications. A lot of real world problems can be reformulated as shape optimization problems which are constrained by partial differential equations (PDE). Aerodynamic shape optimization [49], acoustic shape optimization [50], optimization of interfaces in transmission problems [19, 48], image restoration and segmentation [24], electrochemical machining [23] and inverse modelling of skin structures [47] can be mentioned as examples. The subject of shape optimization is covered by several fundamental monographs, see, for instance, [14, 56].

Questions like How can shapes be defined? or How does the set of all shapes look like? have been extensively studied in recent decades. Already in 1984, David G. Kendall has introduced the notion of a shape space in [30]. Often, a shape space is just modeled as a linear (vector) space, which in the simplest case is made up of vectors of landmark positions (cf. [13, 30]). However, there is a large number of different shape concepts, e.g., plane curves [42, 43], surfaces in higher dimensions [4, 31, 40], boundary contours of objects [18, 36, 64], multiphase objects [63], characteristic functions of measurable sets [65] and morphologies of images [15]. In a lot of processes in engineering, medical imaging and science, there is a great interest to equip the space of all shapes with a significant metric to distinguish between different shape geometries. In the simplest shape space case (landmark vectors), the distances between shapes can be measured by the Euclidean distance, but in general, the study of shapes and their similarities is a central problem. In order to tackle natural questions like How different are shapes?, Can we determine the measure of their difference? or Can we infer some information? mathematically, we have to put a metric on the shape space. There are various types of metrics on shape spaces, e.g., inner metrics [4, 42] like the Sobolev metrics, outer metrics [6, 30, 42], metamorphosis metrics [26, 60], the Wasserstein or Monge-Kantorovic metric on the shape space of probability measures [2, 7], the Weil-Petersson metric [34], current metrics [16] and metrics based on elastic deformations [18, 64]. However, it is a challenging task to model both, the shape space and the associated metric. There does not exist a common shape space or shape metric suitable for all applications. Different approaches lead to diverse models. The suitability of an approach depends on the requirements in a given situation.

In contrast to a finite dimensional optimization problem, which can be obtained, e.g., by representing shapes as splines, the connection of shape calculus with infinite dimensional spaces [14, 28, 56] leads to a more flexible approach. In recent work, it has been shown that PDE constrained shape optimization problems can be embedded in the framework of optimization on shape spaces. E.g., in [53], shape optimization is considered as optimization on a Riemannian shape manifold, the manifold of smooth shapes. Moreover, an inner product, which is called Steklov–Poincaré metric, for the application of finite element (FE) methods is proposed in [54].

First, we concentrate on the particular manifold of smooth shapes and consider the first Sobolev and the Steklov–Poincaré metric in this paper. The definition of the Riemannian shape gradient with respect to these two metrics results in the formulation of gradient based optimization algorithms. One aim of this paper is to give an overview of the optimization techniques in the space of smooth shapes together with the first Sobolev and Steklov–Poincaré metric. This paper extends the gradient based results in [62], where the theory of PDE constrained shape optimization problems is connected with the differential-geometric structure of the space of smooth shapes. To be more precisely, this paper aims at the definition of a Riemannian shape Hessian with respect to the first Sobolev metric. In order to formulate such a definition, the covariant derivative needs to be specified. This paper formulates a theorem about the covariant derivative associated with the first Sobolev metric. This opens the door for formulating higher order methods in the space of smooth shapes.

The manifold of smooth shapes contains shapes with infinitely differentiable boundaries, which limits the practical applicability. For example, in the setting of PDE constrained shape optimization, one has to deal with polygonal shape representations from a computational point of view. This is because FE methods are usually used to discretize the models. In [54], not only an inner product, the Steklov–Poincaré metric, is given but also a suitable shape space for the application of FE methods is proposed. The combination of this particular shape space and its associated inner product is an essential step towards applying efficient FE solvers as outlined in [55]. However, so far, this shape space and its properties are not investigated. From a theoretical point of view, it is necessary to clarify its structure. If we do not know the structure, there is no chance to get control over the space. Thus, this paper aims at a generalization of smooth shapes to shapes which arise naturally in shape optimization problems. We define the space of so-called \(H^{1/2}\)-shapes. Moreover, we clarify its structure as a diffeological one and, thus, go towards the formulation of optimization techniques on diffeological spaces. Since a diffeological space is one of the generalizations of manifolds, this paper formulates a theorem which clarifies the difference between manifolds and diffeological spaces.

This paper is organized as follows. In Sect. 2, besides a short overview of basic concepts in shape optimization (Sect. 2.1), the connection of shape calculus with the differential-geometric structure of shape spaces is stated (Sect. 2.2). In particular, the Riemannian shape gradients with respect to the first Sobolev and Steklov–Poincaré metric are defined and the Riemannian shape Hessian with respect to the first Sobolev metric is given. One of the main theorems of this paper is Theorem 2, which specifies the covariant derivative associated with the first Sobolev metric necessary for the definition of the Riemannian shape Hessian with respect to the first Sobolev metric. Thanks to the definition of the Riemannian shape Hessian we are able to formulate the Newton method in the space of smooth shapes together with the first Sobolev metric. Additionally, we give a brief overview of first order optimization techniques based on gradients with respect to the first Sobolev metric as well as the Steklov–Poincaré metric. In particular, Sect. 2.2 ends with a comparison of the gradient based algorithms for a specific example. Section 3 is concerned with the space of \(H^{1/2}\)-shapes. First, we give a brief introduction in diffeological spaces and explain the difference between these spaces and manifolds (Sect. 3.1). The first main theorem of Sect. 3 is Theorem 3, which specifies the difference between diffeological spaces and manifolds. In Sect. 3.2, the space of \(H^{1/2}\)-shapes is defined. Here, Theorem 4, which is the third and last of the main theorems in this paper, endows the space of \(H^{1/2}\)-shapes with its diffeological structure.

2 Optimization in Shape Spaces

First, we set up notation and terminology of basic shape optimization concepts (Sect. 2.1). Afterwards, shape calculus is combined with geometric concepts of shape spaces (Sect. 2.2). In [62], the theory of shape optimization problems constrained by partial differential equations is already connected with the differential-geometric structure of the space of smooth shapes. Moreover, gradient-based methods are outlined. However, Sect. 2.2 extends these results to a Riemannian shape Hessian, for which the covariant derivative needs to be specified. This opens the door for formulating higher order methods in space of smooth shapes. In particular, we formulate a Newton method on the space of smooth shapes based on the definition of the Riemannian shape Hessian.

2.1 Basic Concepts in Shape Optimization

This section sets up notation and terminology of basic shape optimization concepts used in this paper. For a detailed introduction into shape calculus, we refer to the monographs [14, 56].

One of the main focuses of shape optimization is to investigate shape functionals and solve shape optimization problems. First, we give the definition of a shape functional.

Definition 1

(Shape functional) Let D denote a non-empty subset of \({\mathbb {R}}^d\), where \(d\in {\mathbb {N}}\). Moreover, \({\mathcal {A}}\subset \{\varOmega :\varOmega \subset D\}\) denotes a set of subsets. A function

$$\begin{aligned} J:{\mathcal {A}}\rightarrow {\mathbb {R}}\text {, } \varOmega \mapsto J(\varOmega ) \end{aligned}$$

is called a shape functional.

Let \(J:{\mathcal {A}}\rightarrow {\mathbb {R}}\) be a shape functional, where \({\mathcal {A}}\) is a set of subsets \(\varOmega \) as in Definition 1. An unconstrained shape optimization problem is given by

$$\begin{aligned} \min _{\varOmega \in {\mathcal {A}}} J(\varOmega ). \end{aligned}$$
(1)

Often, shape optimization problems are constrained by equations, e.g., equations involving an unknown function of two or more variables and at least one partial derivative of this function. In this case, the objective functional J has two arguments, the shape \(\varOmega \) as well as the so-called state variable y, where the state variable is the solution of the underlying constraint. A constrained shape optimization problem reads as

$$\begin{aligned} \min _{(\varOmega ,y)\in {\mathcal {A}}\times {\mathcal {X}}(\varOmega )}&J(\varOmega ,y) \end{aligned}$$
(2)
$$\begin{aligned} \text {s.t. } \quad&y=y(\varOmega ) \text { solves } {\mathcal {F}} (\varOmega ,y(\varOmega ))=0, \end{aligned}$$
(3)

where \({\mathcal {X}}(\varOmega )\) is usually a function space and the constraint \({\mathcal {F}} (\varOmega ,y(\varOmega ))=0\) is given for example by a PDE or a system of PDEs. When J in (2) depends on a solution of a PDE, we call the shape optimization problem PDE constrained.

Let D be as in Definition 1. Moreover, let \(\{F_t\}_{t\in [0,T]}\) be a family of mappings \(F_t:{\overline{D}}\rightarrow {\mathbb {R}}^d\) such that \(F_0={\text {id}}\), where \({\overline{D}}\) denotes the closure of D and \(T>0\). This family transforms the domain \(\varOmega \) into new perturbed domains

$$\begin{aligned} \varOmega _t := F_t(\varOmega )=\{F_t(x):x\in \varOmega \}\text { with }\varOmega _0=\varOmega \end{aligned}$$

and the boundary \(\varGamma \) of \(\varOmega \) into new perturbed boundaries

$$\begin{aligned} \varGamma _t := F_t(\varGamma )=\{F_t(x):x\in \varGamma \}\text { with }\varGamma _0=\varGamma . \end{aligned}$$

Such a transformation can be described by the velocity method or by the perturbation of identity. We concentrate on the perturbation of identity, which is defined by \(F_t(x):= x+tV(x)\), where V denotes a sufficiently smooth vector field.

To solve shape optimization problems, we need their shape derivatives.

Definition 2

(Shape derivative) Let \(D\subset {\mathbb {R}}^d\) be open, \(\varOmega \subset D\) and \(k\in {\mathbb {N}}\cup \{\infty \}\). Moreover, let \({\mathcal {C}}^k_0(D,{\mathbb {R}}^d)\) denote the set of \({\mathcal {C}}^k(D,{\mathbb {R}}^d)\)-functions which vanish on \(\partial \varOmega \). The Eulerian derivative of a shape functional J at \(\varOmega \) in direction \(V\in {\mathcal {C}}^k_0(D,{\mathbb {R}}^d)\) is defined by

$$\begin{aligned} DJ(\varOmega )[V]:= \lim \limits _{t\rightarrow 0^+}\frac{J(\varOmega _t)-J(\varOmega )}{t}. \end{aligned}$$
(4)

If for all directions \(V\in {\mathcal {C}}^k_0(D,{\mathbb {R}}^d)\) the Eulerian derivative (4) exists and the mapping

$$\begin{aligned} G(\varOmega ):{\mathcal {C}}^k_0(D,{\mathbb {R}}^d)\rightarrow {\mathbb {R}}, \ V\mapsto DJ(\varOmega )[V] \end{aligned}$$

is linear and continuous, the expression \(DJ(\varOmega )[V]\) is called the shape derivative of J at \(\varOmega \) in direction \(V\in {\mathcal {C}}^k_0(D,{\mathbb {R}}^d)\). In this case, J is called shape differentiable of class \({\mathcal {C}}^k\) at \(\varOmega \).

Remark 1

There are many options to prove shape differentiability of shape functionals which depend on a solution of a PDE and to derive the shape derivative of a shape optimization problem. The min–max approach [14], the chain rule approach [56], the Lagrange method of Céa [10] and the rearrangement method [29] have to be mentioned in this context. A nice overview about these approaches is given in [59].

The Hadamard Structure Theorem (cf. [56, Theorem 2.27]) states that under certain assumptions the shape derivative is a distribution acting on the normal part of the perturbation field on the boundary.

Theorem 1

(Hadamard Structure Theorem) Let D and \(\varOmega \) be as in Definition 2. Moreover, let the shape functional J be shape differentiable of class \({\mathcal {C}}^k\) at every domain \(\varOmega \subset D\) with \({\mathcal {C}}^{k-1}\)-boundary \(\varGamma =\partial \varOmega \). Then there exists a scalar distribution \(r\in {\mathcal {C}}^k_0(\varGamma )'\) such that \(G(\varOmega )\in {\mathcal {C}}^k_0(\varOmega ,{\mathbb {R}}^d)'\) of J at \(\varOmega \) is given by

$$\begin{aligned} G(\varOmega )=\gamma _\varGamma '(r\cdot n). \end{aligned}$$
(5)

Here \( {\mathcal {C}}^k_0(\varGamma )'\) and \({\mathcal {C}}^k_0(\varOmega ,{\mathbb {R}}^d)'\) denote the dual spaces of \( {\mathcal {C}}^k_0(\varGamma )\) and \({\mathcal {C}}^k_0(\varOmega ,{\mathbb {R}}^d)\). Moreover,

denotes the trace operator and \(\gamma _\varGamma '\) its adjoint operator.

Note that the Hadamard Structure Theorem 1 actually states the existence of a scalar distribution \(r=r(\varOmega )\) on the boundary \(\varGamma \) of a domain \(\varOmega \). However, in this paper, we always assume that r is an integrable function. In general, if \(r\in L^1(\varGamma )\), then r is obtained in the form of the trace on \(\varGamma \) of an element \(G\in W^{1,1}(\varOmega )\). This means that it follows from (5) that the shape derivative can be expressed more conveniently as

$$\begin{aligned} DJ(\varOmega )[V] =\int _\varGamma r\left<V, n\right> ds. \end{aligned}$$

If the objective functional is given by an integral over the whole domain, the shape derivative can be expressed as an integral over the domain, the so-called volume or weak formulation, and an integral over the boundary, the so-called surface or strong formulation:

$$\begin{aligned} DJ_\varOmega [V]&:= \int _\varOmega RV(x)\, dx&\text {(volume/weak formulation)} \end{aligned}$$
(6)
$$\begin{aligned} DJ_\varGamma [V]&:=\int _\varGamma r(s)\left<V(s), n(s)\right> ds&\text {(surface/strong formulation)} \end{aligned}$$
(7)

Here \(r\in L^1(\varGamma )\) and R is a differential operator acting linearly on the vector field V with \(DJ_\varOmega [V]=DJ(\varOmega )[V]=DJ_\varGamma [V]\). Recent advances in PDE constrained optimization on shape manifolds are based on the surface formulation, also called Hadamard-form, as well as intrinsic shape metrics. Major effort in shape calculus has been devoted towards such surface expressions (cf. [14, 56]), which are often very tedious to derive. When one derives a shape derivative of an objective functional which is given by an integral over the domain, one first get the volume formulation. This volume form can be converted into its surface form by applying the integration by parts formula. In order to apply this formula, one needs a higher regularity of the state and adjoint of the underlying PDE. Recently, it has been shown that the weak formulation has numerical advantages, see, for instance, [8, 19, 25, 48]. In [35], also practical advantages of volume shape formulations have been demonstrated. However, volume integral forms of shape derivatives require an outer metric on the domain surrounding the shape boundary. In contrast to inner metrics, which can be seen as describing a deformable material that the shape itself is made of, the differential operator governing outer metrics is defined even outside of the shape (cf., e.g., [6, 9, 30, 42]). In [54], both points of view are harmonized by deriving a metric from an outer metric. Based on this metric, efficient shape optimization algorithms, which also reduce the analytical effort so far involved in the derivation of shape derivatives, are proposed in [54, 55, 61]. The next subsection concentrates on the question how shape calculus and in particular shape derivatives can be combined with geometric concepts of shape spaces. This combination results in efficient optimization techniques in shape spaces.

2.2 Shape Calculus Combined with Geometric Concepts of Shape Spaces

As pointed out in [51], shape optimization can be viewed as optimization on Riemannian shape manifolds and the resulting optimization methods can be constructed and analyzed within this framework. This combines algorithmic ideas from [1] with the Riemannian geometrical point of view established in [4]. In this subsection, we analyze the connection of Riemannian geometry on the space of smooth shapes to shape optimization and extend the results in [62], which is also concerned with this connection, to a Riemannian shape Hessian and to a second order optimization method. In particular, we specify the covariant derivative associated with the first Sobolev metric on the space of smooth shapes (cf. Theorem 2 in Sect. 2.2.1), which results in the definition of the Riemannian shape Hessian with respect to this metric (cf. Definition 5 in Sect. 2.2.2). The formulation of the covariant derivative and the shape Hessian opens the door for formulating higher order methods in space of smooth shapes (cf. Algorithm 2 in Sect. 2.2.2).

2.2.1 The Space of Smooth Shapes

We first introduce the space of smooth shapes and summarize some of its properties which are relevant for this paper from the literature [3,4,5, 32, 41, 42]. First, we concentrate on one-dimensional shapes, which are defined as the images of simple closed smooth curves in the plane of the unit circle. Such simple closed smooth curves can be represented by embeddings from the circle \(S^1\) into the plane \({\mathbb {R}}^2\), see, for instance, [33]. Therefore, the set of all embeddings from \(S^1\) into \({\mathbb {R}}^2\), denoted by \(\mathrm {Emb}(S^1,{\mathbb {R}}^2)\), represents all simple closed smooth curves in \({\mathbb {R}}^2\). However, note that we are only interested in the shape itself and that images are not changed by re-parametrizations. Thus, all simple closed smooth curves which differ only by re-parametrizations can be considered equal to each other because they lead to the same image. Let \(\mathrm {Diff}(S^1)\) denote the set of all diffeomorphisms from \(S^1\) into itself. This set is a regular Lie group (cf. [32, Chap. VIII, 38.4]) and consists of all the smooth re-parametrizations mentioned above. In [41], the set of all one-dimensional shapes is characterized by

$$\begin{aligned} B_e(S^1,{\mathbb {R}}^2):= \mathrm {Emb}(S^1,{\mathbb {R}}^2)/\mathrm {Diff}(S^1), \end{aligned}$$
(8)

i.e., the obit space of \(\mathrm {Emb}(S^1,{\mathbb {R}}^2)\) under the action by composition from the right by the Lie group \(\mathrm {Diff}(S^1)\). A particular point on \(B_e(S^1,{\mathbb {R}}^2)\) is represented by a curve \( c:S^1\rightarrow {\mathbb {R}}^2 , \ \theta \mapsto c(\theta ) \) and illustrated in the left picture of Fig. 1. The tangent space is isomorphic to the set of all smooth normal vector fields along c, i.e.,

$$\begin{aligned} T_cB_e(S^1,{\mathbb {R}}^2)\cong \left\{ h:h=\alpha n,\, \alpha \in {\mathcal {C}}^\infty (S^1)\right\} , \end{aligned}$$
(9)

where n denotes the exterior unit normal field to the shape boundary c such that \(n (\theta )\perp c_\theta (\theta )\) for all \(\theta \in S^1\), where \(c_\theta =\frac{\partial c}{\partial \theta }\) denotes the circumferential derivative as in [41]. Since we are dealing with parametrized curves, we have to work with the arc length and its derivative. Therefore, we use the following notation:

$$\begin{aligned} ds&= |c_\theta |d\theta&\text {(arc length with respect to} c) \end{aligned}$$
(10)
$$\begin{aligned} D_s&= \frac{\partial _\theta }{|c_\theta |}&\text {(arc length derivative with respect to} c) \end{aligned}$$
(11)

Remark 2

Some properties of the operator \(D_s\) can be found in, e.g., [42]. In [4], this operator is considered for higher dimensions and its connection with the Bochner-Laplacian is given.

In [32], it is proven that the shape space \(B_e(S^1,{\mathbb {R}}^2)\) is a smooth manifold. Is it even perhaps a Riemannian shape manifold? This question was investigated by Peter W. Michor and David Mumford. They show in [41] that the standard \(L^2\)-metric on the tangent space is too weak because it induces geodesic distance equals zero. This phenomenon is called the vanishing geodesic distance phenomenon. The authors employ a curvature weighted \(L^2\)-metric as a remedy and prove that the vanishing phenomenon does not occur for this metric. Several Riemannian metrics on this shape space are examined in further publications, e.g., [4, 40, 42]. All these metrics arise from the \(L^2\)-metric by putting weights, derivatives or both in it. In this manner, we get three groups of metrics: the almost local metrics which arise by putting weights in the \(L^2\)-metric (cf. [5, 42]), the Sobolev metrics which arise by putting derivatives in the \(L^2\)-metric (cf. [4, 42]) and the weighted Sobolev metrics which arise by putting both, weights and derivatives, in the \(L^2\)-metric (cf. [5]). It can be shown that all these metrics do not induce the phenomenon of vanishing geodesic distance under special assumptions. To list all these goes beyond the scope of this paper, but they can be found in the above-mentioned publications. All Riemannian metrics mentioned above are inner metrics. As already mentioned above, this means that the deformation is prescribed on the shape itself and the ambient spaceFootnote 1 stays fixed.

In the following, we clarify briefly how the above-mentioned inner Riemannian metrics can be defined on the shape space \(B_e(S^1,{\mathbb {R}}^2)\). For details we refer to [42]. Moreover, we refer to [3] for a comparison of an inner metric on \(B_e(S^1,{\mathbb {R}}^2)\) with the diffeomorphic matching framework which works with outer metrics.

First, we define a Riemannian metric on the space \(\mathrm {Emb}(S^1,{\mathbb {R}}^2)\), which is a family \(g=\left( g_c(h,k)\right) _{c\in \mathrm {Emb}(S^1,{\mathbb {R}}^2)}\) of inner products \(g_c(h,k)\), where h and k denote vector fields along \(c\in \mathrm {Emb}(S^1,{\mathbb {R}}^2)\). The most simple inner product on the tangent bundle to \(\text {Emb}(S^1,{\mathbb {R}}^2)\) is the standard \(L^2\)-inner product \(g_c(h,k) := \int _{S^1}\left< h,k\right> ds\). Note that

$$\begin{aligned} T_c\text {Emb}(S^1,{\mathbb {R}}^2)\cong {\mathcal {C}}^\infty (S^1,{\mathbb {R}}^2) \qquad \forall \, c\in \text {Emb}(S^1,{\mathbb {R}}^2) \end{aligned}$$
(12)

and that a tangent vector \(h\in T_c\text {Emb}(S^1,{\mathbb {R}}^2)\) has an orthonormal decomposition into smooth tangential components \(h^\top \) and normal components \(h^\perp \) (cf. [41, Sect. 3, 3.2]). In particular, \(h^\perp \) is an element of the bundle of tangent vectors which are normal to the \(\text {Diff}(S^1)\)-orbits denoted by \({\mathcal {N}}_c\). This normal bundle is well defined and is a smooth vector subbundle of the tangent bundle. In [41], it is outlined how the restriction of the metric \(g_c\) to the subbundle \({\mathcal {N}}_c\) gives the quotient metric. The quotient metric induced by the \(L^2\)-metric is given by

$$\begin{aligned} \begin{aligned} g^0:T_cB_e(S^1,{\mathbb {R}}^2)\times T_cB_e(S^1,{\mathbb {R}}^2)&\rightarrow {\mathbb {R}}\text {, } \\ (h,k)&\mapsto \int _{S^1}\left<\alpha ,\beta \right> ds, \end{aligned} \end{aligned}$$
(13)

where \(h=\alpha n\) and \(k=\beta n\) denote two elements of the tangent space \(T_cB_e(S^1,{\mathbb {R}}^2)\) given in (9). Unfortunately, in [41], it is shown that this \(L^2\)-metric induces vanishing geodesic distance, as already mentioned above.

For the following discussion, among all the above-mentioned Riemannian metrics, we pick the first Sobolev metric which does not induce the phenomenon of vanishing geodesic distance (cf. [42]). On \(B_e(S^1,{\mathbb {R}}^2)\) it is defined as follows:

Definition 3

(First Sobolev metric on \(B_e(S^1,{\mathbb {R}}^2)\)) The first Sobolev metric on \(B_e(S^1,{\mathbb {R}}^2)\) is given by

(14)

where \(A>0\) and \(D_s\) denotes the arc length derivative with respect to c defined in (11).

An essential operation in Riemannian geometry is the covariant derivative. In differential geometry, it is often written in terms of the Christoffel symbols. In [4], Christoffel symbols associated with the Sobolev metrics are provided. However, in order to provide a relation with shape calculus, another representation of the covariant derivative in terms of the Sobolev metric \(g^1\) is needed. Now, we get to the first main theorem of this paper. The Riemannian connection provided by this theorem makes it possible to specify the Riemannian shape Hessian.

Theorem 2

Let \(A>0\) and let \(h,m\in T_c\mathrm{Emb}(S^1,{\mathbb {R}}^2)\) denote vector fields along \(c\in \mathrm{Emb}(S^1,{\mathbb {R}}^2)\). The arc length derivative with respect to c is denoted by \(D_s\) as in (11). Moreover, \(L_1:= I-AD_s^2\) is a differential operator on \({\mathcal {C}}^\infty (S^1,{\mathbb {R}}^2)\) and \(L_1^{-1}\) denotes its inverse operator. The covariant derivative associated with the Sobolev metric \(g^1\) can be expressed as

$$\begin{aligned} \nabla _m h=L_1^{-1}(K_1(h))\text { with }K_1 := \frac{1}{2}\left<D_s m,v\right>\left( I+AD_s^2\right) , \end{aligned}$$
(15)

where \(v=\frac{c_\theta }{|c_\theta |}\) denotes the unit tangent vector.

Remark 3

The inverse operator \(L_1^{-1}\) in Theorem 2 is an integral operator whose kernel has an expression in terms of the arc length distance between two points on a curve and their unit normal vectors (cf. [42]). For the existence and more details about \(L_1^{-1}\) we refer to [42].

Proof of Theorem 2

Let hkm be vector fields on \({\mathbb {R}}^2\) along \(c\in \text {Emb}(S^1,{\mathbb {R}}^2)\). Moreover, \(d(\cdot )[m]\) denotes the directional derivative in direction m. From [42, Sect. 4.2, formula (3)], we have

$$\begin{aligned} d(L(h))[m]=A\left<D_sm,v\right>D_s^2h+AD_s\left<D_sm,v\right>D_sh \end{aligned}$$
(16)

Applying (16), we obtain in analogy to the computations in [42, Sect. 4.2]

$$\begin{aligned} \begin{aligned}&d\left( g_c^1(h,k)\right) [m] =d\left( \int _{S^1}\left<L_1(h),k\right>ds\right) [m] \\&=\int _{S^1}\left<d\left( L_1(h)\right) [m],k\right>ds+\int _{S^1}\left<L_1(h),k\right>\left<D_s m,v\right>ds \\&{\mathop {=}\limits ^{(16)}}\int _{S^1}\left<A\left<D_sm,v\right>D_s^2h+AD_s\left<D_sm,v\right>D_sh,k\right>ds\\&\quad +\int _{S^1}\left<L_1(h),k\right>\left<D_s m,v\right>ds. \end{aligned} \end{aligned}$$
(17)

Since the differential operator \(D_s\) is anti self-adjoint for the \(L^2\)-metric \(g^0\), i.e.,

$$\begin{aligned} \int _{S^1} \left<D_sh,k\right>ds=\int _{S^1} \left<h,-D_sk\right>ds, \end{aligned}$$
(18)

we get from (17)

$$\begin{aligned} \begin{aligned} d\left( g_c^1(h,k)\right) [m]&=\int _{S^1}2A\left<D_s m,v\right>\left<D_s^2h,k\right>ds+\int _{S^1}\left<h,k\right>\left<D_s m,v\right>ds\\&-\int _{S^1}A\left<D_s^2h,k\right>\left<D_s m,v\right>ds\\&=\int _{S^1}\left<D_s m,v\right>\left( \left<h,k\right>+A\left<D_s^2h,k\right>\right) ds . \end{aligned} \end{aligned}$$
(19)

Now, we proceed analogously to the proof of Theorem 2.1 in [51], which exploits the product rule for Riemannian connections. Thus, we conclude from

$$\begin{aligned}&d\left( g_c^1(h,k)\right) [m]\\&{\mathop {=}\limits ^{(19)}}\int _{S^1}\left<D_s m,v\right>\left[ \frac{1}{2}\left( \left<h,k\right>+A\left<D_s^2h,k\right>\right) +\frac{1}{2}\left( \left<h,k\right>+A\left<D_s^2h,k\right>\right) \right] ds\\&{\mathop {=}\limits ^{{(18)}}}\int _{S^1}\left<\frac{1}{2}\left<D_s m,v\right>\left( I+AD_s^2\right) h,k\right>+\left<h,\frac{1}{2}\left<D_s m,v\right>\left( I+AD_s^2\right) k\right>ds\\&\quad =\int _{S^1}\left<L_1\left[ L_1^{-1}\left( \frac{1}{2}\left<D_s m,v\right>\left( I+AD_s^2\right) h\right) \right] ,k\right>ds\\&\qquad +\int _{S^1}\left<h,L_1\left[ L_1^{-1}\left( \frac{1}{2}\left<D_s m,v\right>\left( I+AD_s^2\right) k\right) \right] \right>ds\\&\quad =g_c^1\left( L_1^{-1}\left( \frac{1}{2}\left<D_s m,v\right>\left( I+AD_s^2\right) h\right) ,k\right) \\&\qquad +g_c^1\left( h,L_1^{-1}\left( \frac{1}{2}\left<D_s m,v\right>\left( I+AD_s^2\right) k\right) \right) \end{aligned}$$
Fig. 1
figure 1

Examples of one- and two-dimensional shapes

that the covariant derivative associated with \(g^1\) is given by (15). \(\square \)

Remark 4

For the sake of completeness it should be mentioned that the shape space \(B_e(S^1,{\mathbb {R}}^2)\) and its theoretical results can be generalized to higher dimensions. Let M be a compact manifold and let N denote a Riemannian manifold with \(\text {dim}(M)<\text {dim}(N)\). In [40], the space of all submanifolds of type M in N is defined by

$$\begin{aligned} B_e(M,N):=\text {Emb}(M,N)/\text {Diff}(M). \end{aligned}$$
(20)

In Fig. 1, the left picture illustrates a two-dimensional shape which is an element of the shape space \(B_e(S^2,{\mathbb {R}}^3)\). In contrast, the second shape from left in this figure is a two-dimensional shape which is not an element of this shape space. Note that the vanishing geodesic distance phenomenon occurs also for the \(L^2\)-metric in higher dimensions as verified in [40]. For the definition of the Sobolev metric \(g^1\) in higher dimensions we refer to [4].

2.2.2 Optimization in the Space of Smooth Shapes

In the following, we focus on two Riemannian metrics on the space of smooth shapes \(B_e\), the first Sobolev metric \(g^1\) introduced in Sect. 2.2.1 and the Steklov–Poincaré metric \(g^S\) defined below. The aim of this subsection is to provide some optimization techniques in \((B_e,g^1)\) and \((B_e,g^S)\).

The subsection is structured in three paragraphs. The first paragraph considers the first Sobolev metric \(g^1\), where we repeat firstly some relevant results from [62]. Afterwards, we built on our findings of the previous subsection and extend the results in [62]. More precisely, thanks to the specification of the covariant derivative associated with \(g^1\) in the previous subsection, we are able to define the Riemannian shape Hessian with respect to \(g^1\) and formulate the Newton method in \((B_e,g^1)\), which is based on this definition. As we will see below, if we consider Sobolev metrics, we have to deal with surface formulations of shape derivatives. An intermediate and equivalent result in the process of deriving these expressions is the volume expression as already mentioned above. These volume expressions are preferable over surface forms. This is not only because of saving analytical effort, but also due to additional regularity assumptions, which usually have to be required in order to transform volume into surface forms, as well as because of saving programming effort. However, in the case of the more attractive volume formulation, the shape manifold \(B_e\) and the corresponding inner products \(g^1\) are not appropriate. One possible approach to use volume forms is addressed in the second paragraph of this subsection, which considers Steklov–Poincaré metrics. We summarize some of the main results related to this metric from [54] with view on optimization methods. Finally, the third paragraph of this subsection considers a specific example and concludes this subsection with a brief discussion about the two approaches resulting from considering the first Sobolev and the Steklov–Poincaré metric. Since this paper does not focus on numerical investigations, we pick an example which is already implemented in [54, 61] to illustrate the main differences between the two approaches.

Optimization based on first Sobolev metrics

We consider the Sobolev metric \(g^1\) on the shape space \(B_e\). In particular, this means that we consider elements of \(B_e\), i.e., smooth boundaries \(\varGamma \) of the domain \(\varOmega \) under consideration in the following. The Riemannian connection with respect to this metric, which is given in Theorem 2, makes it possible to specify the Riemannian shape Hessian of an optimization problem.

First, we detail the Riemannian shape gradient from [62]. Due to the Hadamard Structure Theorem, there exists a scalar distribution r on the boundary \(\varGamma \) of the domain \(\varOmega \) under consideration. If we assume \(r\in L^1(\varGamma )\), the shape derivative can be expressed on the boundary \(\varGamma \) of \(\varOmega \) (cf. (7)). The distribution r is often called the shape gradient in the literature. However, note that gradients depend always on the chosen scalar products or on the chosen metrics defined on the space under consideration. If we want to optimize on a shape manifold, we have to find a representation of the shape gradient with respect to a Riemannian metric defined on the shape manifold under consideration. This representation is called the Riemannian shape gradient. The shape derivative can be expressed more concisely as

$$\begin{aligned} DJ_\varGamma [V]=\int _\varGamma \alpha r \ ds \end{aligned}$$
(21)

if . In order to get an expression of the Riemannian shape gradient with respect to the Sobolev metric \(g^1\), we look at the isomorphism (9). Due to this isomorphism, a tangent vector \(h\in T_\varGamma B_e\) is given by \(h=\alpha n\) with \(\alpha \in {\mathcal {C}}^\infty (\varGamma )\). This leads to the following definition.

Definition 4

(Riemannian shape gradient with respect to the first Sobolev metric) A Riemannian representation of the shape derivative, i.e., the Riemannian shape gradient of a shape differentiable objective function J in terms of the first Sobolev metric \(g^1\), is given by

$$\begin{aligned} \text {grad}(J)=qn \ \text { with } \ (I-A{D^2_s})q{n}=r{n}, \end{aligned}$$
(22)

where \(D_s\) is the arc length derivative with respect to \(\varGamma \in B_e\), \(A>0\), \(q\in {\mathcal {C}}^\infty (\varGamma )\) and r denotes the function in the shape derivative representation (7) for which we assume \(r\in {\mathcal {C}}^\infty (\varGamma )\).

Next, we specify the Riemannian shape Hessian with respect to the first Sobolev metric. It is based on the Riemannian connection \(\nabla \) related to the Sobolev metric \(g^1\) given in one of the main theorems of this paper, Theorem 2, as well as on the Riemannian shape gradient definition, Definition 4. In analogy to [1], we can define the Riemannian shape Hessian as follows:

Definition 5

(Riemannian shape Hessian with respect to the first Sobolev metric) Let \(\nabla \) be the covariant derivative associated with the Sobolev metric \(g^1\). The Riemannian shape Hessian with respect to the Sobolev metric \(g^1\) of a two times shape differentiable objective function J is defined as the linear mapping

$$\begin{aligned} T_\varGamma B_e \rightarrow T_\varGamma B_e \text {, } h \mapsto \text {Hess}(J)[h]:= \nabla _h \text {grad}(J). \end{aligned}$$
(23)

The Riemannian shape gradient and the Riemannian shape Hessian with respect to the Sobolev metric \(g^1\) are required to apply first and second order optimization methods in the shape space \((B_e,g^1)\). The gradient method is an example for a first order optimization method. If we apply the gradient method to (1) and consider \(g^1\) on \(B_e\), we need to compute the Riemannian shape gradient with respect to \(g^1\) from (22). The negative gradient is then used as descent direction for the objective functional J. An example for a second order method is the Newton method. If we apply the Newton method to (1), we need to solve—similarly to standard non-linear programming—the problem of finding \(\varGamma \in B_e\) with

$$\begin{aligned} \text {grad}(J(\varOmega ))=0, \end{aligned}$$
(24)

where \(\varGamma \) denotes the boundary of \(\varOmega \) and \(\text {grad}J\) is the gradient with respect to \(g^1\) (cf. (22)).

In general, the calculations of optimization methods on manifolds have to be performed in tangent spaces. This means, points from a tangent space have to be mapped to the manifold in order to get a new iterate. More precisely, we need to take a given tangent vector to the manifold, run along the geodesic starting at that point and go in that direction for a special length defined by the optimization process. The computation of the Riemannian exponential map, which is the theoretically superior choice of such a mapping, is prohibitively expensive in the most applications. However, in [1], it is shown that a so-called retraction is a first-order approximation and sufficient.

Definition 6

(Retraction) A retraction on a manifold M is a smooth mapping \({\mathcal {R}}:TM\rightarrow M\) with the following properties:

  1. (i)

    \({\mathcal {R}}_p(0_p)=p\), where \({\mathcal {R}}_p\) denotes the restriction of \({\mathcal {R}}\) to \(T_pM\) and \(0_p\) denotes the zero element of \(T_pM\).

  2. (ii)

    \(d{\mathcal {R}}_p(0_p)=\text {id}_{T_pM}\), where \(\text {id}_{T_pM}\) denotes the identity mapping on \(T_pM\) and \(d{\mathcal {R}}_p(0_p)\) denotes the pushforward of \(0_p\in T_pM\) by \({\mathcal {R}}\).

For example, in \(B_e(S^1,{\mathbb {R}}^2)\), for sufficiently small perturbations \(\alpha \in {\mathcal {C}}^\infty (S^1)\), a retraction \({\mathcal {R}}\) is defined by

$$\begin{aligned} {\mathcal {R}}_c(\eta _c):= c+\eta _c, \end{aligned}$$
(25)

where \(\eta _c\in T_cB_e(S^1,{\mathbb {R}}^2)\) and \( c+\eta _c:S^1 \rightarrow {\mathbb {R}}^2,\, \theta \mapsto c(\theta )+\alpha (\theta )n(c(\theta )) \).

Now, we are able to formulate the gradient in \((B_e,g^1)\) (cf. Algorithm 1). Thanks to the definition of the Riemannian shape Hessian \(\text {Hess}J\) in \((B_e,g^1)\), which is based on the resulting Riemannian connection \(\nabla \) given in (15), we can formulate also the Newton method in \((B_e,g^1)\) (cf. Algorithm 2). We require the shape function r of the surface shape derivative (cf. (7)) in both algorithms. This function is needed to compute the shape gradient with respect to \(g^1\) in each iteration. In Algorithm 2, both, the Riemannian shape gradient and the Riemannian shape Hessian with respect to \(g^1\), are required. If we have a PDE constrained shape optimization problem, the Newton method can be applied to find stationary points of the Lagrangian of the optimization problem which leads to the Lagrange–Newton method.

figure a
figure b

Optimization based on Steklov–Poincaré metrics

Gradients with respect to \(g^1\) are based on surface expressions of shape derivatives as you can see in (22), where r is the function in the surface shape derivative representation (7). As outlined at the beginning of this section, volume expressions are preferable over surface forms. One possible approach to use volume forms is to consider Steklov–Poincaré metrics \(g^S\) (cf. [54]). In the following, we summarize some of the main results related to this metric from [54] with view on first order optimization approaches in the space of smooth shapes. In order to be able to formulate higher order methods in \((B_e,g^S)\) an explicit expression of the covariant derivative with respect to the Sobolev metric is necessary. The derivation of such an expression and also the formulation and investigation of higher order methods in \((B_e,g^S)\) is not in the scope of this paper and left for future work.

In the following, we need to deal with Lipschitz boundaries. Since there are several competing conditions which are used to define a Lipschitz boundary, we first specify its definition:

Definition 7

(\({\mathcal {C}}^{k,r}\)-boundary, Lipschitz boundary) Let \(\varOmega \subset {\mathbb {R}}^d\) be open with boundary \(\varGamma =\partial \varOmega \). Moreover, let \(k\in \overline{{\mathbb {N}}}\) and \( {\mathcal {C}}^{k,r}({\overline{\varOmega }})\) denote the set of \({\mathcal {C}}^k\)-functions which are Hölder-continuous with exponent \(r\in [0,1]\). Further, \(B_d(x,R)\) denotes the ball in \({\mathbb {R}}^d\) centered at \(x\in {\mathbb {R}}^d\) with radius \(R>0\). We say \(\varOmega \) has a \({\mathcal {C}}^{k,r}\)-boundary or \(\varOmega \) is \({\mathcal {C}}^{k,r}\) if for any \(x\in \varGamma \) there exist local coordinates \(y_1,\ldots ,y_d\) centered at x, i.e., such that x is the unique solution of \(y_1=\dots =y_d=0\), and constants \(a,b>0\) as well as a mapping \(\psi \in {\mathcal {C}}^{k,r}( B_{d-1}(x,a))\), where \(B_{d-1}(x,a)\) is considered in the linear subspace defined by \((y_1,\ldots ,y_{d-1})\), subject to the following conditions:

  1. (i)

    \(y_d=\psi ({\widetilde{y}}) \Rightarrow ({\widetilde{y}},y_d)\in \varGamma \),

  2. (ii)

    \(\psi ({\widetilde{y}})<y_d<\psi ({\widetilde{y}})+b \Rightarrow ({\widetilde{y}},y_d)\in \varOmega \),

  3. (iii)

    \(\psi ({\widetilde{y}})-b<y_d<\psi ({\widetilde{y}}) \Rightarrow ({\widetilde{y}},y_d)\not \in {\overline{\varOmega }}\).

Definition 8

(Steklov–Poincaré metric) Let \(\varOmega \subset X\subset {\mathbb {R}}^d\) be a compact domain with \(\varOmega \ne \emptyset \) and Lipschitz-boundary \(\varGamma :=\partial \varOmega \), where X denotes a bounded domain with Lipschitz-boundary \(\varGamma _\text {out}:=\partial X\). The Steklov–Poincaré metric is given by

$$\begin{aligned} \begin{aligned} g^S:H^{1/2}(\varGamma )\times H^{1/2}(\varGamma )&\rightarrow {\mathbb {R}},\\ (\alpha ,\beta )&\mapsto \int _{\varGamma } \alpha (s)\cdot [(S^{pr})^{-1}\beta ](s)\ ds. \end{aligned} \end{aligned}$$
(28)

Here \(S^{pr}\) denotes the projected Poincaré–Steklov operator which is given by

$$\begin{aligned} S^{pr}:H^{-1/2}(\varGamma ) \rightarrow H^{1/2}(\varGamma ),\ \alpha \mapsto (\gamma _0 U)^T n, \end{aligned}$$
(29)

where \(\gamma _0:H^1_0(X,{\mathbb {R}}^d) \rightarrow H^{1/2}(\varGamma ,{\mathbb {R}}^d)\), and \(U\in H^1_0(X,{\mathbb {R}}^d)\) solves the Neumann problem

$$\begin{aligned} a(U,V)=\int _{\varGamma } \alpha \cdot (\gamma _0 V)^T n\ ds\quad \forall V\in H^1_0(X,{\mathbb {R}}^d) \end{aligned}$$
(30)

with \(a(\cdot ,\cdot )\) being a symmetric and coercive bilinear form.

Remark 5

Note that a Steklov–Poincaré metric depends on the choice of the bilinear form. Thus, different bilinear forms lead to various Steklov–Poincaré metrics.

Next, we state the connection of \(B_e\) with respect to the Steklov–Poincaré metric \(g^S\) to shape calculus. As already mentioned, the shape derivative can be expressed as the surface integral (7) due to the Hadamard Structure Theorem. Recall that the shape derivative can be written more concisely (cf. (21)). Due to isomorphism (9) and expression (21), we can state the connection of the shape space \(B_e\) with respect to the Steklov–Poincaré metric \(g^S\) to shape calculus.

Definition 9

(Shape gradient with respect to Steklov–Poincaré metric) Let \(r\in {\mathcal {C}}^\infty (\varGamma )\) denote the function in the shape derivative expression (7). Moreover, let \(S^{pr}\) be the projected Poincaré-Steklov operator and let \(\gamma _0\) be as in Definition 8. A representation \(h\in T_{\varGamma } B_e\cong {\mathcal {C}}^\infty (\varGamma )\) of the shape gradient in terms of \(g^S\) is determined by

$$\begin{aligned} g^S(\phi ,h)=\left( r,\phi \right) _{L^2(\varGamma )} \quad \forall \phi \in {\mathcal {C}}^\infty (\varGamma ), \end{aligned}$$
(31)

which is equivalent to

$$\begin{aligned} \int _{\varGamma } \phi (s)\cdot [(S^{pr})^{-1}h](s) \ ds=\int _{\varGamma } r(s)\phi (s) \ ds \quad \forall \phi \in {\mathcal {C}}^\infty (\varGamma ). \end{aligned}$$
(32)

Remark 6

In Definition 9, the isomorphism \(T_{\varGamma } B_e\cong {\mathcal {C}}^\infty (\varGamma )\) is given. It is worth to mention that for example identifying \(\varGamma \) with the corresponding embedding of the circle leads to this isomorphism. In particular, attention needs to be put onto (9) and (12).

Now, the shape gradient with respect to Steklov–Poincaré metric is defined. This enables the formulation of optimization methods in \(B_e\) which involve volume formulations of shape derivatives. From (32) we get \(h=S^{pr}r=(\gamma _0 U)^T n\), where \(U\in H^1_0(X,{\mathbb {R}}^d)\) solves

$$\begin{aligned} a(U,V)=\int _{\varGamma } r\cdot (\gamma _0 V)^T n \ ds =DJ_{\varGamma }[V]=DJ_\varOmega [V] \quad \forall V\in H^1_0(X,{\mathbb {R}}^d) \end{aligned}$$
(33)

with \(a(\cdot ,\cdot )\) being the symmetric and coercive bilinear form on \(H_0^1(X,{\mathbb {R}}^d) \times H_0^1(X,{\mathbb {R}}^d)\) of the Steklov–Poincaré metric definition (cf. (30)). The identity (33) opens the door to consider volume expression of shape derivatives to compute the shape gradient with respect to \(g^S\). In order to compute the shape gradient, we have to solve

$$\begin{aligned} a(U, V) = b(V) \quad \forall V\in H^1_0(X,{\mathbb {R}}^d) \end{aligned}$$
(34)

with \(b(\cdot )\) being a linear form and given by

$$\begin{aligned} b(V):=DJ_\text {vol}(\varOmega )[V]+DJ_\text {surf}(\varOmega )[V]. \end{aligned}$$

Here \(J_\text {surf}(\varOmega )\) denotes parts of the objective function leading to surface shape derivative expressions, e.g., perimeter regularizations, and is incorporated as Neumann boundary condition in equation (34). Parts of the objective function leading to volume shape derivative expressions are denoted by \(J_\text {vol}(\varOmega )\). The bilinear form \(a(\cdot ,\cdot )\) can be chosen, e.g., as the weak form of the linear elasticity equation. More details can be found below, in the paragraph about the comparison of Algorithms 1 and 3.

Remark 7

Note that it is not ensured that \(U\in H^1_0(X,{\mathbb {R}}^d)\) is \({\mathcal {C}}^\infty \). Thus, \(h=S^{pr}r=(\gamma _0 U)^\top n\) is not necessarily an element of \(T_{\varGamma }B_e\). However, under special assumptions depending on the coefficients of a second-order partial differential operator and the right-hand side of a PDE, a weak solution U which is at least \(H^1_0\)-regular is \({\mathcal {C}}^\infty \) (cf. [17, Sect. 6.3, Theorem 6]).

Thanks to the definition of the gradient with respect to \(g^S\) we are able to formulate the gradient method on \((B_e,g^S)\) (cf. Algorithm 3). We compute the Riemannian shape gradient with respect to \(g^S\) from (34). The negative solution \(-U\) is then used as descent direction for the objective functional J. In Algorithm 3, in order to be in line with the above theory, it is assumed that in each iteration k, the shape \(\xi ^k\) is a subset of a general surrounding space X, which is assumed to be a bounded domain with Lipschitz boundary as illustrated in Fig. 2.

figure c

Comparison of Algorithm 1 and Algorithm 3

We conclude this section with a brief discussion about Algorithms 1 and 3. Since this paper does not focus on numerical investigations, we pick an example which is already implemented in [54, 61]. We summarize briefly numerical results observed in [54, 61] in order to illustrate how the algorithms work. Additionally, we discuss the main differences between the two approaches.

Let \(\varOmega \subset X\subset {\mathbb {R}}^2\) be a domain with \( \partial \varOmega = \varGamma \), where X denotes a bounded domain with Lipschitz-boundary \(\varGamma _\text {out}:=\partial X\). In contrast to the outer boundary \(\varGamma _\text {out}\), which is assumed to be fixed and partitioned in \(\varGamma _\text {out}:=\varGamma _{\text {bottom}} \sqcup \varGamma _{ \text {left}} \sqcup \varGamma _{\text {right}} \sqcup \varGamma _{\text {top}}\) (here, \(\sqcup \) denotes the disjoint union), the inner boundary \(\varGamma \), which is also called the interface, is variable. Let the interface \(\varGamma \) be an element of \(B_e(S^1,{\mathbb {R}}^2)\). Note that X depends on \(\varGamma \). Thus, we denote it by \(X(\varGamma )\). Figure 2 illustrates this situation. We consider the following parabolic PDE constrained interface problem (cf. [54, 61]):

$$\begin{aligned}&\min _{\varGamma } \int _{0}^{T} \int _{X(\varGamma )} (y-{\overline{y}})^2 dxdt+\mu \int _{\varGamma }1ds \end{aligned}$$
(36)
$$\begin{aligned} \text{ s.t. } \frac{\partial y}{\partial t}- \mathrm {div}(k\nabla y)&=f\quad \text {in }X(\varGamma )\times (0,T] \end{aligned}$$
(37)
$$\begin{aligned} y&=1\quad \text {on }\varGamma _\mathrm {\varGamma _{top}}\times (0,T] \end{aligned}$$
(38)
$$\begin{aligned} \frac{\partial y}{\partial n}&=0\quad \text {on }\left( \varGamma _{\text {bottom}} \cup \varGamma _{ \text {left}} \cup \varGamma _{\text {right}}\right) \times (0,T] \end{aligned}$$
(39)
$$\begin{aligned} y&=y_0\quad \text {in }X(\varGamma )\times \{0\} \end{aligned}$$
(40)

with

$$\begin{aligned} k:={\left\{ \begin{array}{ll} k_1 = \mathrm {const.}\quad \text { in }X\setminus {\overline{\varOmega }}\times (0,T]\\ k_2 = \mathrm {const.} \quad \text { in }\varOmega \times (0,T] \end{array}\right. } \end{aligned}$$

denoting a jumping coefficient, n being the unit outward normal vector to \(\varOmega \) and \({\bar{y}}\in H^1(X(\varGamma ))\) represents data measurements. The second term in the objective function (36) is a perimeter regularization with \(\mu >0\). Please note that formulation (37) of the PDE has to be understood only formally because of the jumping coefficient k.

Fig. 2
figure 2

Example of the domain X with \(\varOmega \subset X\subset {\mathbb {R}}^2\)

In order to solve the shape optimization problem (36)–(40), we first need to solve the underlying PDE, the so-called state equation. The solution of the parabolic boundary value problem (37)–(40) is obtained by discretizing its weak formulation with standard linear finite elements in space and an implicit Euler scheme in time. The diffusion parameter k is discretized as a piecewise constant function. Figure 3 illustrates an example initial shape geometry, where the domain is discretized with a fine and coarse finite element mesh. Besides the underlying PDE, we also need to solve the corresponding adjoint problem to the shape optimization problem (36)–(40), which is given in our example by

$$\begin{aligned} -\frac{\partial p}{\partial t}-\mathrm {div}(k\nabla p)&=-(y-{\overline{y}}) \quad \text {in }X(\varGamma ) \times [0,T) \end{aligned}$$
(41)
$$\begin{aligned} p&= 0\quad \text {in }X(\varGamma ) \times \{T\} \end{aligned}$$
(42)
$$\begin{aligned} \frac{\partial p}{\partial n}&=0\quad \text {on }\left( \varGamma _{\text {bottom}} \cup \varGamma _{ \text {left}} \cup \varGamma _{\text {right}}\right) \times [0,T) \end{aligned}$$
(43)
$$\begin{aligned} p&=0\quad \text {on }\varGamma \mathrm {\varGamma _{top}}\times [0,T) \end{aligned}$$
(44)

and which can be discretized in the same way as the state equation.

Remark 8

In general, the solution of the state and adjoint equation are needed in Algorithms 1 and 3 because they are part of the shape derivative of the objective functional.

We use the retraction given in (25) in order to update the shapes according to Algorithm 1 (cf. (26)) and Algorithm 3 (cf. (35)), respectively. This retraction is closely related to the perturbation of identity defined on the domain X. Given a stating shape \(\varGamma ^k\) in the k-th iteration of Algorithm 3, the perturbation of identity acting on the domain X in the direction \(U^k\), where \(U^k\) solves (34), gives

$$\begin{aligned} X(\varGamma ^{k+1})=\{x\in X:x=x^k+t^kU^k\}, \end{aligned}$$
(45)

i.e. the vector field \(U^k\) weighted by a step size \(t^k\) is added as a deformation to all nodes in the finite element mesh. One calls \(U^k\) also mesh deformation (field). Here, the volume form allows us to optimize directly over the domain X containing \(\varGamma ^k\in B_e\). It is worth to mention that, in practice, we are only interested in the deformation on X because we need to update the finite element mesh after each iteration. The update of the shape \(\varGamma ^k\) itself is contained in this deformation field. In contrast to Algorithm 3, Algorithm 1 can only work with surface shape derivative expressions. These surface formulations would give us descent directions (in normal directions) for \(\varGamma ^k\) only, which would not help us to move mesh elements around the shape. Additionally, when we are working with a surface shape derivative, we need to solve another PDE in order to get a mesh deformation in the ambient space X. Below, this issue is addressed in more detail.

Both approaches follow roughly the same steps but with a major difference in the way of computing the mesh deformation. For convenience we summarize one optimization iteration and the main aspects of the two approaches:

  1. 1.

    Solve the state and adjoint equation.

  2. 2.

    Compute the mesh deformation:

    • Algorithm 3 The computation of a representation of the shape gradient with respect to the chosen inner product of the tangent space is moved into the mesh deformation itself. In particular, we get the gradient representation and the mesh deformation all at once from (33), which is very attractive from a computational point of view. The bilinear form \(a(\cdot ,\cdot )\) in (34) is used as both, an inner product and a mesh deformation, leading to only one linear system, which has to be solved. In practice, the bilinear form \(a(\cdot ,\cdot )\) in (34) is chosen as the weak form corresponding to the linear elasticity equation, i.e.,

      $$\begin{aligned} a(U,V)= \int _{X(\varGamma )} \sigma (U):\epsilon (V) \, dx, \end{aligned}$$

      where  :  denotes the sum of the component-wise products and \(\sigma \), \(\varepsilon \) are the so-called strain and stress tensor,Footnote 2 respectively. In strong form, (34) is given by

      $$\begin{aligned} \text {div}( \sigma )&= f^\text {elas} \quad \text {in} \quad X(\varGamma ) \end{aligned}$$
      (46)
      $$\begin{aligned} U&= 0 \quad \text {on} \quad \varGamma _\text {out} \end{aligned}$$
      (47)

      Here, the source term \(f^\text {elas} \) in its weak form is given by the shape derivative parts in volume form, where parts of the objective function leading to surface expressions only, such as, for instance, the perimeter regularization, are incorporated in Neumann boundary conditions. In our example, the shape derivative is given by

      $$\begin{aligned} \begin{aligned} DJ(\varGamma )[V] =&\int _{0}^{T}\int _{X(\varGamma )}-k\nabla y^T\left( \nabla V+\nabla V^T\right) \nabla p-p\nabla f^T V\\&+\mathrm {div}(V)\left( \frac{1}{2}(y-{\overline{y}})^2+\frac{\partial y}{\partial t}p+k\nabla y^T\nabla p-fp\right) dxdt\\&+\int _{\varGamma }\kappa \left<V,n\right>ds \end{aligned} \end{aligned}$$

      where \(\kappa \) denotes the mean curvature of \(\varGamma \) and \(y, \,p\) denote the solution of the state and adjoint equation, respectively (cf. [53]).

    • Algorithm 1 First, a representation of the shape gradient on \(\varGamma \) with respect to the Sobolev metric \(g^1\) as given in (22) needs to be computed by solving

      $$\begin{aligned} (I-AD^2_s)qn=rn. \end{aligned}$$
      (48)

      In our example, r is given by , where \(\kappa \) denotes the mean curvature of \(\varGamma \), the jump symbol is defined on the interface \(\varGamma \) by , \(y_{1} := \text {tr}_{\text {out}}(y\vert _{X\setminus {\overline{\varOmega }}})\) with y denoting the solution of the state equation, \(p_2 := \text {tr}_{\text {in}}(p \vert _{\varOmega })\) with p denoting the solution of the adjoint equation, and \(\text {tr}_{\text {in}}:\varOmega \rightarrow \varGamma \) and \(\text {tr}_{\text {out}}:X\setminus {\overline{\varOmega }} \rightarrow \varGamma \) are trace operators (cf. [53]). In order to compute a mesh deformation field, we need to solve a further PDE. In practice, this further PDE is again equation (46)–(47) but modified as follows: the Dirichlet boundary condition

      $$\begin{aligned} U = U^\text {surf} \quad \text {on} \quad \varGamma \end{aligned}$$

      is added to (46)–(47), where \(U^\text {surf}\) is the representation of the shape gradient with respect to the Sobolev metric \(g^1\); the source term \(f^\text {elas}\) is set to zero.

  3. 3.

    Apply the resulting deformation to the current finite element mesh, and go to the next iteration.

Remark 9

The strain and stress tensor in (46) are defined by \(\sigma :=\lambda \text {tr}(\varepsilon ) I + 2 \mu \varepsilon \), \(\varepsilon := \frac{1}{2}\left( \nabla U + \nabla U^T\right) \), where \(\lambda \) and \(\mu \) denote the so-called Lamé parameters. The Lamé parameters do not need to have a physical meaning here but it is rather essential to understand their effect on the mesh deformation. They can be expressed in terms of Young’s modulus E and Poisson’s ratio \(\nu \) as \(\lambda = \frac{\nu E}{(1+\nu )(1-2\nu )} ,\, \mu = \frac{E}{2(1+\nu )}\). Young’s modulus E states the stiffness of the material, which enables to control the step size for the shape update, and Poisson’s ratio \(\nu \) gives the ratio controlling how much the mesh expands in the remaining coordinate directions when compressed in one particular direction.

Besides saving analytical effort during the calculation process of the shape derivative, Algorithm 3 is computationally more efficient than using Algorithm 1. The optimization algorithm based on domain shape derivative expressions (Algorithm 3) can be applied to very coarse meshes with approximately 100,000 cells (cf. right picture of Fig. 3). This is due to the fact that there is no dependence on normal vectors like in the case of surface shape gradients, which are needed in Algorithm 1. In [54, 61], the convergence of the gradient method for the surface and volume shape derivative formulation are investigated for the parabolic shape interface problem. It can be observed that the convergence with the representation of the shape gradient with respect to \(g^1\) seems to require fewer iterations compared to the domain-based formulation. Yet, the domain-based form is computationally more attractive since it also works for much coarser discretizations. This can be seen in a comparision the two meshes in Fig. 3. In particular, the mesh in the left picture in Fig. 3 shows the necessary fineness of the mesh for the surface gradient (Algorithm 1) to lead to a reasonable convergence. However, the coarse grid in Fig. 3 works only for the domain-based formulation (Algorithm 3).

Fig. 3
figure 3

Different initial meshes

We can conclude that Algorithm 3 is very attractive from a computational point of view. However, the shape space \(B_e\) containing smooth shapes unnecessarily limits the application of this algorithm. More precisely, numerical investigations have shown that the optimization techniques also work on shapes with kinks in the boundary (cf. [52, 54, 55]). This means that Algorithm 3 is not limited to elements of \(B_e\) and another shape space definition is required. Thus, in [54], the definition of smooth shapes is extended to so-called . In the next section, it is clarified what we mean by \(H^{1/2}\)-shapes. However, only a first try of a definition is given in [54]. From a theoretical point of view there are several open questions about this shape space. The most important question is how the structure of this shape space is. If we do not know the structure, there is no chance to get control over the space. Moreover, the definition of this shape space has to be adapted and refined. The next section is concerned with the novel space of \(H^{1/2}\)-shapes and in particular with its structure.

3 The Shape Space \({\mathcal {B}}^{\mathbf {1/2}}\)

The Steklov–Poincaré metric correlates shape gradients with \(H^1\)-deformations. Under special assumptions, these deformations give shapes of class \(H^{1/2}\), which are defined below. As already mentioned above the shape space \(B_e\) unnecessarily limits the application of the methods mentioned in the previous section. Thus, this section aims at a generalization of smooth shapes to shapes which arise naturally in shape optimization problems. In the setting of \(B_e\), shapes can be considered as the images of embeddings. From now on we have to think of shapes as boundary contours of deforming objects. Therefore, we need another shape space. In this section, we define the space of \(H^{1/2}\)-shapes and clarify its structure as a diffeological one.

First, we do not only define diffeologies and related objects, but also explain the difference between diffeological spaces and manifolds (Sect. 3.1). In particular, we formulate the second main theorem of this paper, Theorem 3. Afterwards, the space of \(H^{1/2}\)-shapes is defined (Sect. 3.2). In the third main theorem, Theorem 4, we see that it is a diffeological space.

3.1 A Brief Introduction into Diffeological Spaces

In this subsection, we define diffeologies and related objects. Moreover, we clarify the difference between manifolds and diffeological spaces. For a detailed introduction into diffeological spaces we refer to [27].

3.1.1 Definitions

We start with the definition of a diffeological space and related objects like a diffeology, with which a diffeological space is equipped, and plots, which are the elements of a diffeology. Afterwards, we consider subset and quotient diffeologies. These two objects are required in the main theorem of Sect. 3.2. The definitions and theorems in this Sect. 3.1.1 are summarized from [27].

Definition 10

(Parametrization, diffeology, diffeological space, plots) Let Y be a non-empty set. A parametrization in Y is a map \(U\rightarrow Y\), where U is an open subset of \({\mathbb {R}}^n\). A diffeology on Y is any set \(D_Y\) of parametrizations in Y such that the following three axioms are satisfied:

  1. (i)

    Covering Any constant parametrization \({\mathbb {R}}^n\rightarrow Y\) is in \(D_Y\).

  2. (ii)

    Locality Let P be a parametrization in Y, where \(\text {dom}(P)\) denotes the domain of P. If, for all \(r\in \text {dom}(P)\), there is an open neighborhood V of r such that the restriction \(P|V\in D_Y\) , then \(P \in D_Y\).

  3. (iii)

    Smooth compatibility Let \(p:O\rightarrow Y\) be an element of \(D_Y\), where O denotes an open subset of \({\mathbb {R}}^n\). Moreover, let \(q:O'\rightarrow O\) be a smooth map in the usual sense, where \(O'\) denotes an open subset of \({\mathbb {R}}^m\). Then \(p\circ q\in D_Y\) holds.

A non-empty set Y together with a diffeology \(D_Y\) on Y is called a diffeological space and denoted by \((Y,D_Y)\). The parametrizations \(p\in D_Y\) are called plots of the diffeology \(D_Y\). If a plot \(p\in D_Y\) is defined on \(O\subset {\mathbb {R}}^n\), then n is called the dimension of the plot and p is called n-plot.

In the literature, there are a lot of examples of diffeologies, e.g., the diffeology of the circle, the square, the set of smooth maps, etc. For those we refer to [27].

Remark 10

A diffeology as a structure and a diffeological space as a set equipped with a diffeology are distinguished only formally. Every diffeology on a set contains the underlying set as the set of non-empty 0-plots (cf. [27]).

Next, we want to connect diffeological spaces. This is possible though smooth maps between two diffeological spaces.

Definition 11

(Smooth map between diffeological spaces, diffeomorphism) Let \((X,D_X),(Y,D_Y)\) be two diffeological spaces. A map \(f:X\rightarrow Y\) is smooth if for each plot \(p\in D_X\), \(f\circ p\) is a plot of \( D_Y \), i.e., \(f\circ D_X\subset D_Y\). If f is bijective and if both, f and its inverse \(f^{-1} \), are smooth, f is called a diffeomorphism. In this case, \((X,D_X)\) is called diffeomorphic to \((Y,D_Y)\).

The stability of diffeologies under almost all set constructions is one of the most striking properties of the class of diffeological spaces like in the subset, quotient, functional or powerset diffeology. In the following, we concentrate on the subset and quotient diffeology. The concept of these are required in the proof of the main theorem in the next subsection.

Subset diffeology Every subset of a diffeological space carries a natural subset diffeology, which is defined by the pullback of the ambient diffeology by the natural inclusion.

Before we can construct the subset diffeology, we have to clarify the natural inclusion and the pullback. For two sets AB with \(A\subset B\), the (natural) inclusion is given by \(\iota _A:A\rightarrow B\), \(x\mapsto x\). The pullback is defined as follows:

Theorem and Definition 12

(Pullback) Let X be a set and \((Y,D_Y)\) be a diffeological space. Moreover, \(f:X\rightarrow Y\) denotes some map.

  1. (i)

    There exists a coarsest diffeology of X such that f is smooth. This diffeology is called the pullback of the diffeology \(D_Y\) by f and is denoted by \(f^*(D_Y)\).

  2. (ii)

    Let p be a parametrization in X. Then \(p\in f^*(D_Y)\) if and only if \(f\circ p\in D_Y\).

Proof

See [27, Chap. 1, 1.26]. \(\square \)

The construction of subset diffeologies is related to so-called inductions.

Definition 13

(Induction) Let \((X,D_X),(Y,D_Y)\) be diffeological spaces. A map \(f:X\rightarrow Y\) is called induction if f is injective and \(f^*(D_Y)=D_X\), where \(f^*(D_Y)\) denotes the pullback of the diffeology \(D_Y\) by f.

The illustration of an induction as well as the criterions for being an induction can be found in [27, Chap. 1, 1.31].

Now, we are able to define the subset diffeology (cf. [27]).

Theorem and Definition 14

(Subset diffeology) Let \((X,D_X)\) be a diffeological space and let \(A\subset X\) be a subset. Then A carries a unique diffeology \(D_A\), called the subset or induced diffeology, such that the inclusion map \(\iota _A:A\rightarrow X\) becomes an induction, namely, \(D_A=\iota _A^*(D_X)\). We call \((A,D_A)\) the diffeological subspace of \((X,D_X)\).

Quotient diffeology

Like every subset of a diffeological space inherits the subset diffeology, every quotient of a diffeological space carries a natural quotient diffeology defined by the pushforward of the diffeology of the source space to the quotient by the canonical projection.

First, we have to clarify the canonical projection. For a set A and an equivalence relation \(\sim \) on A, the canonical projection is defined as \(\pi :X\rightarrow X/\sim \), \(x\mapsto [x]\), where \([x]:= \{x'\in X:x\sim x'\}\) denotes the equivalence class of x with respect to \(\sim \). Moreover, the pushforward has to be defined:

Theorem and Definition 15

(Pushforward) Let \((X,D_X)\) be a diffeological space and Y be a set. Moreover, \(f:X\rightarrow Y\) denotes a map.

  1. (i)

    There exists a finest diffeology of Y such that f is smooth. This diffeology is called the pushforward of the diffeology \(D_X\) by f and is denoted by \(f_*(D_X)\).

  2. (ii)

    A parametrization \(p:U\rightarrow Y\) lies in \(f_*(D_X)\) if and only if every point \(x\in U\) has an open neighbourhood \(V\subset U\) such that is constant or of the form for some plot \(q:V\rightarrow X\) with \(q\in D_X\).

Proof

See [27, Chap. 1, 1.43]. \(\square \)

Remark 11

If a map f from a diffeological space \((X,D_X)\) into a set Y is surjective, then \(f_*(D_X)\) consists precisely of the plots \(p:U\rightarrow Y\) which locally are of the form \(f\circ q\) for plots \(q\in D_X\) since those already contain the constant parametrizations.

The construction of quotient diffeologies is related to so-called subductions.

Definition 16

(Subduction) Let \((X,D_X),(Y,D_Y)\) be diffeological spaces. A map \(f:X\rightarrow Y\) is called subduction if f is surjective and \(f_*(D_X)=D_Y\), where \(f_*(D_X)\) denotes the pushforward of the diffeology \(D_X\) by f.

The illustration of a subduction as well as the criterions for being a subduction can be found in [27, Chap. 1, 1.48].

Now, we can define the quotient diffeology (cf. [27]).

Theorem and Definition 17

(Quotient diffeology) Let \((X,D_X)\) be a diffeological space and \(\sim \) be an equivalence relation on X. Then the quotient set \(X/\sim \) carries a unique diffeologcial sturcture \(D_{X/\sim } \), called the quotient diffeology, such that the canonical projection \(\pi :X\rightarrow X/\sim \) becomes a subduction, namely, \(D_{X/\sim }=\pi _*(D_X)\). We call \(\left( X/\sim ,D_{X/\sim }\right) \) the diffeological quotient of \((X,D_X)\) by the relation \(\sim \).

One aim of this paper is to go towards optimization algorithms in diffeological spaces. Thus, we end this subsection with a brief discussion about the topology of a diffeological space which is necessary to discuss properties of optimization methods in diffeological spaces like convergence. Every diffeological space induces a unique topology, the so-called D-topology, which is a natural topology and introduced by Patrick Iglesias–Zemmour for each diffeological space (cf. [27]). In particular, openess, compactness and convergence depend on the D-topology. Given a diffeological space \((X,D_X)\), the D-topology is the finest topology such that all plots are continuous. That is, a subset U of X is open (in the D-topology) if for any plot \(p:O\rightarrow X\) the pre-image \(p^{-1}U\subset O\) is open. For more information about the D-topology we refer to the literature, e.g., [27, Chap. 2, 2.8] or [11]. However, if \((X,D_X)\) is a diffeological space and one knows that a sequence \(\{x_n\}\) converges with respect to the topology of X, it is not guaranteed that \(\{x_n\}\) converges also for the D-topology because this topology is finer than the given one on X. Thus, all discussions about compactness, convergence, etc in the diffeological sense reduces to the D-topology.

3.1.2 Differences Between Diffeological Spaces and Manifolds

Manifolds can be generalized in many ways. In [58], a summary and comparison of possibilities to generalize smooth manifolds are given. One generalization is a diffeological space on which we concentrate in this section. In the following, the main differences between manifolds and diffeological spaces are figured out and formulated in Theorem 3. For simplicity, we concentrate on finite-dimensional manifolds. However, it has to be mentioned that infinite-dimensional manifolds can also be understood as diffeological spaces. This follows, e.g., from [32, Corollary 3.14] or [37].

Given a smooth manifold there is a natural diffeology on this manifold consisting of all parametrizations which are smooth in the classical sense. This yields the following definition.

Definition 18

(Diffeological space associated with a manifold) Let M be a finite-dimensional (not necessarily Hausdorff or paracompact) smooth manifold. The diffeological space associated with M is defined as \((M,D_M)\), where the diffeology \(D_M\) consists precisely of the parametrizations of M which are smooth in the classical sense.

Remark 12

If MN denote finite-dimensional manifolds, then \(f:M\rightarrow N\) is smooth in the classical sense if and only if it is a smooth map between the associated diffeological spaces \((M,D_M)\rightarrow (N,D_N)\).

In order to characterize the diffeological spaces which arise from manifolds, we need the concept of smooth points.

Definition 19

Let \((X,D_X)\) be a diffeological space. A point \(x\in X\) is called smooth if there exists a subset \(U\subset X\) which is open with respect to the topology of X and contains x such that \((U,D_U)\) is diffeomorphic to an open subset of \({\mathbb {R}}^n\), where \(D_U\) denotes the subset diffeology.

The concept of smooth points is quite simple. Let us consider the coordinate axes, e.g., in \({\mathbb {R}}^2\). All points of the two axis with exception of the origin are smooth points.

Now, we are able to formulate the following main theorem:

Theorem 3

A diffeological space \((X,D_X)\) is associated with a (not necessarily paracompact or Hausdorff) smooth manifold if and only if each of its points is smooth.

Proof

We have to show the following statements:

  1. (i)

    Given a smooth manifold M, then each point of the associated diffeological space \((M,D_M)\) is smooth.

  2. (ii)

    Given a diffeological space \((X,D_X)\) for which all points are smooth, then it is associated with a smooth manifold M.

To (i) Let M be a smooth manifold and \(x\in M\) an arbitrary point. Then there exists an open neighbourhood \(U\subset M\) of x which is diffeomorphic to an open subset \(O\subset {\mathbb {R}}^n\). Let \(f:U\rightarrow O\) be a diffeomorphism. This diffeomorphism is a diffeomphism of the associated diffeological spaces \((U,D_U)\) and \((O,D_O)\). Thus, \(x\in M\) is a smooth point. Since \(x\in M\) is an arbitrary point, each point of \((M,D_M)\) is smooth.

To (ii) Let \((X,D_X)\) be a diffeological space for which all points are smooth. Then there exist an open cover \(X=\bigcup _{i\in I} U_i\) and diffeomorphisms \(f_i:U_i\rightarrow O_i \) onto open subsets \(O_i\subset {\mathbb {R}}^{n}\). The map is smooth (in the diffeological sense) for all \(i,j\in I\). Due to Remark 12, the map is smooth in the classical sense for all \(i,j\in I\). Thus, \(\{(U_i,f_i)\}_{i\in I}\) defines a smooth atlas and a manifold structure on X is defined. Let D be the associated diffeology. A similar argument as above shows that the diffeology D agrees with the original one \(D_X\). \(\square \)

This theorem clarifies the difference between manifolds and diffeological spaces. Roughly speaking, a manifold of dimension n is getting by glueing together open subsets of \({\mathbb {R}}^n\) via diffeomorphisms. In contrast, a diffeological space is formed by glueing together open subsets of \({\mathbb {R}}^n\) with the difference that the glueing maps are not necessarily diffeomorphisms and that n can vary. However, note that manifolds deal with charts and diffeological spaces deal with plots. A system of local coordinates, i.e., a diffeomorphism \(p:U\rightarrow U'\) with \(U\subset {\mathbb {R}}^n\) open and \(U'\subset X\) open, can be viewed as a very special kind of plot \(U\rightarrow X\) which induces an induction on the corresponding diffeological spaces.

Remark 13

Note that we consider smooth manifolds which do not necessary have to be Hausdorff or paracompact. If we understand a manifold as Hausdorff and paracompact, then the diffeological space \((X,D_X)\) in Theorem 3 has to be Hausdorff and paracompact. In this case, we need the concept of open sets in diffeological spaces. Whether a set is open depends on the topology under consideration. In the case of diffeological spaces, openness depends on the D-topology.

3.2 The Diffeological Shape Space

We extend the definition of smooth shapes, which are elements of the shape space \(B_e\), to shapes of class \(H^{1/2}\). In the following, it is clarified what we mean by \(H^{1/2}\)-shapes. We would like to recall that a shape in the sense of the shape space \(B_e\) is given by the image of an embedding from the unit sphere \(S^{d-1}\) into the Euclidean space \({\mathbb {R}}^d\). In view of our generalization, it has technical advantages to consider so-called Lipschitz shapes which are defined as follows.

Definition 20

(Lipschitz shape) A \((d-1)\)-dimensional Lipschitz shape \(\varGamma _0\) is defined as the boundary \(\varGamma _0=\partial {\mathcal {X}}_0\) of a compact Lipschitz domain \({\mathcal {X}}_0\subset {\mathbb {R}}^d\) with \({\mathcal {X}}_0\ne \emptyset \). The set \({\mathcal {X}}_0\) is called a Lipschitz set.

Example of Lipschitz shapes are illustrated in Fig. 4. In contrast, Fig. 5 shows examples of shapes which are non-Lipschitz shapes.

General shapes—in our novel terminology—arise from \(H^1\)-deformations of a Lipschitz set \({\mathcal {X}}_0\). These \(H^1\)-deformations, evaluated at a Lipschitz shape \(\varGamma _0\), give deformed shapes \(\varGamma \) if the deformations are injective and continuous. These shapes are called of class \(H^{1/2}\) and proposed firstly in [54]. The following definitions differ from [54]. This is because of our aim to define the space of \(H^{1/2}\)-shapes as diffeological space which is suitable for the formulation of optimization techniques and its applications.

Definition 21

(Shape space \({\mathcal {B}}^{1/2}\)) Let \(\varGamma _0\subset {\mathbb {R}}^d\) be a \((d-1)\)-dimensional Lipschitz shape. The space of all \((d-1)\)-dimensional \(H^{1/2}\)-shapes is given by

$$\begin{aligned} {{{\mathcal {B}}}}^{1/2}(\varGamma _0,{\mathbb {R}}^d):= \mathcal{H}^{1/2}(\varGamma _0,{\mathbb {R}}^d)\big /\sim \ , \end{aligned}$$
(49)

where

$$\begin{aligned} \begin{aligned}&{{{\mathcal {H}}}}^{1/2}(\varGamma _0,{\mathbb {R}}^d)\\&:= \{w:w\in H^{1/2}(\varGamma _0, {\mathbb {R}}^d) \text { injective, continuous; } w(\varGamma _0) \text { Lipschitz shape} \} \end{aligned} \end{aligned}$$
(50)

and the equivalence relation \(\sim \) is given by

$$\begin{aligned} w_1\sim w_2 \Leftrightarrow w_1(\varGamma _0)=w_2(\varGamma _0), \text { where } w_1,w_2\in \mathcal{H}^{1/2}(\varGamma _0,{\mathbb {R}}^d). \end{aligned}$$
(51)
Fig. 4
figure 4

Examples of elements of \({\mathcal {B}}^{1/2}\) and, thus, Lipschitz shapes in the one- and two-dimensional case. The illustrated shapes are not elements of \(B_e\)

The set \({{{\mathcal {H}}}}^{1/2}(\varGamma _0,{\mathbb {R}}^d)\) is obviously a subset of the Sobolev–Slobodeckij space \(H^{1/2}(\varGamma _0,{\mathbb {R}}^d)\), which is well-known as a Banach space (cf. [39, Chap. 3]). Banach spaces are manifolds and, thus, we can view \(H^{1/2}(\varGamma _0,{\mathbb {R}}^d)\) with the corresponding diffeology. This encourages the following theorem which provides the space of \(H^{1/2}\)-shapes with a diffeological structure. The next theorem is the third (and, thus, last) main theorem of this paper.

Theorem 4

The set \(\mathcal{H}^{1/2}(\varGamma _0,{\mathbb {R}}^d)\) and the space \(\mathcal{B}^{1/2}(\varGamma _0,{\mathbb {R}}^d)\) carry unique diffeologies such that the inclusion map \(\iota _{\mathcal{H}^{1/2}(\varGamma _0,{\mathbb {R}}^d)}:\mathcal{H}^{1/2}(\varGamma _0,{\mathbb {R}}^d) \rightarrow H^{1/2}(\varGamma _0,{\mathbb {R}}^d)\) is an induction and such that the canonical projection \(\pi :{{{\mathcal {H}}}}^{1/2}(\varGamma _0,{\mathbb {R}}^d) \rightarrow \mathcal{B}^{1/2}(\varGamma _0,{\mathbb {R}}^d)\) is a subduction.

Proof

Let \(D_{H^{1/2}(\varGamma _0,{\mathbb {R}}^d)}\) be the diffeology on \(H^{1/2}(\varGamma _0,{\mathbb {R}}^d)\). Due to Theorem and Definition 14, \(\mathcal{H}^{1/2}(\varGamma _0,{\mathbb {R}}^d)\) carries the subset diffeology \(\iota _{\mathcal{H}^{1/2}(\varGamma _0,{\mathbb {R}}^d)}^*\left( D_{H^{1/2}(\varGamma _0,{\mathbb {R}}^d)}\right) \). Then the space \({{{\mathcal {B}}}}^{1/2}(\varGamma _0,{\mathbb {R}}^d)\) carries the quotient diffeology

$$\begin{aligned} D_{\mathcal{B}^{1/2}(\varGamma _0,{\mathbb {R}}^d)}:=\pi _*\left( \iota _{\mathcal{H}^{1/2}(\varGamma _0,{\mathbb {R}}^d)}^*\left( D_{H^{1/2}(\varGamma _0,{\mathbb {R}}^d)}\right) \right) \end{aligned}$$
(52)

due to Theorem and Definition 17. \(\square \)

So far, we have defined the space of \(H^{1/2}\)-shapes and showed that it is a diffeological space. The appearance of a diffeological space in the context of shape optimization can be seen as a first step or motivation towards the formulation of optimization techniques on diffeological spaces. Note that, so far, there is no theory for shape optimization on diffeological spaces. Of course, properties of the shape space \(\mathcal{B}^{1/2}\left( \varGamma _0,{\mathbb {R}}^d\right) \) have to be investigated. E.g., an important question is how the tangent space looks like. Tangent spaces and tangent bundles are important in order to state the connection of \(\mathcal{B}^{1/2}\left( \varGamma _0,{\mathbb {R}}^d\right) \) to shape calculus and in this way to be able to formulate optimization algorithms in \(\mathcal{B}^{1/2}\left( \varGamma _0,{\mathbb {R}}^d\right) \). There are many equivalent ways to define tangent spaces of manifolds, e.g., geometric via velocities of curves, algebraic via derivations or physical via cotangent spaces (cf. [33]). Many authors have generalized these concepts to diffeological spaces, e.g., [12, 22, 27, 57]. In [57], tangent spaces are defined for diffeological groups by identifying smooth curves using certain states. Tangent spaces and tangent bundles for many diffeological spaces are given in [22]. Here smooth curves and a more intrinsic identification are used. However, in [12], it is pointed out that there are some errors in [22]. In [27], the tangent space to a diffeological space at a point is defined as a subspace of the dual of the space of 1-forms at that point. These are used to define tangent bundles. In [12], two approaches to the tangent space of a general diffeological space at a point are studied. The first one is the approach introduced in [22] and the second one is an approach which uses smooth derivations on germs of smooth real-valued functions. Basic facts about these tangent spaces are proven, e.g., locality and that the internal tangent space respects finite products. Note that the tangent space to \({{{\mathcal {B}}}}^{1/2}\left( \varGamma _0,{\mathbb {R}}^d\right) \) as diffeological space and related objects which are needed in optimization methods, e.g., retractions and vector transports, cannot be deduced or defined so easily. The study of these objects and the formulation of optimization methods on a diffeological space go beyond the scope of this paper and are topics of subsequent work. Moreover, note that the Riemannian structure \(g^S\) on \(\mathcal{B}^{1/2}\left( \varGamma _0,{\mathbb {R}}^d\right) \) has to be investigated in order to define \(\mathcal{B}^{1/2}\left( \varGamma _0,{\mathbb {R}}^d\right) \) as a Riemannian diffeological space. In general, a diffeological space can be equipped with a Riemannian structure as outlined, e.g., in [38].

Besides the tangent spaces, another open question is which assumptions guarantee that the image of a Lipschitz shape under \(w\in H^{1/2}\left( \varGamma _0,{\mathbb {R}}^d\right) \) is again a Lipschitz shape. Of course, the image of a Lipschitz shape under a continuously differentiable function is again a Lipschitz shape, but the requirement that w is a \({\mathcal {C}}^1\)-function is a too strong. One idea is to require that w has to be a bi-Lipschitz function. Unfortunately, the image of a Lipschitz shape under a bi-Lipschitz function is not necessarily a Lipschitz shape as the example given in [44, Sect. 4.1] shows. Another option is to generalize the concept of Lipschitz domains to non-tangentially accessible (NTA) domains. In order to formulate the definition of these domains, we need the concept of so-called Harnack chains.

Definition 22

(\(\alpha \)-Harnack chain) Let \(\alpha \ge 1\). For a metric space \(({\mathbb {R}}^d,\text {d})\) and an open set \(\varOmega \subset {\mathbb {R}}^d\), a sequence of balls \(B_0, \dots ,B_k\subset \varOmega \) is called an \(\alpha \)-Harnack chain in \(\varOmega \) if \(B_i\cap B_{i-1}\not = \emptyset \) for all \(i=1,\dots ,k\) and

$$\begin{aligned} \alpha ^{-1} \text {dist}(B_i,\partial \varOmega )\le r(B_i)\le \alpha \text {dist}(B_i,\partial \varOmega ), \end{aligned}$$

where \(\text {dist}(B_i,\partial \varOmega ):=\inf \limits _{x\in B_i, y\in \partial \varOmega } \text {d}(x,y)\) and \(r(B_i)\) is the radius of \(B_i\).

Definition 23

(Non-tangentially accessible domain) Let \(({\mathbb {R}}^d,\text {d})\) be a metric space. A bounded open set \(\varOmega \) is called an non-tangentially accessible domain (NTA domain) if the following conditions hold:

  1. (i)

    There exist \(\alpha \ge 1\) such that for all \(\eta >0\) and for all \(x,y\in \varOmega \) such that \(\text {dist}(x,\partial \varOmega )\ge \eta \), \(\text {dist}(y,\partial \varOmega )\ge \eta \) and \(\text {d}(x,y)\le C\eta \) for some \(C>0\), there exists an \(\alpha \)-Harnack chain \(B_0,\dots ,B_k\subset \varOmega \) such that \(x\in B_0,y\in B_k\) and k depends on C but not on \(\eta \).

  2. (ii)

    \(\varOmega \) satisfies the corkscrew condition, i.e., there exist \(r_0>0\) and \(\varepsilon >0\) such that for all \(r\in (0,r_0)\) and \(x\in \partial \varOmega \) the sets \(B(x,r)\cap \varOmega \), \(B(x,r)\cap ({\mathbb {R}}^d\setminus {\overline{\varOmega }})\) contain a ball of radius \(\varepsilon r\).

In fact, the image of an non-tangentially accessible (NTA) domain under a global quasiconformal mapping is an NTA domain (cf. [21]). If we consider boundaries \(\varGamma _0\) of NTA domains \({\mathcal {X}}_0\) instead of Lipschitz domains, the space \({\mathcal {H}}^{1/2}\) defined in (50) changes to

$$\begin{aligned} \begin{aligned}&{{{\mathcal {H}}}}^{1/2}(\varGamma _0,{\mathbb {R}}^d)\\ {}&:= \{w:w\in H^{1/2}(\varGamma _0, {\mathbb {R}}^d) \text { injective, continuous; }w=\text {tr}\,W \text { with }\\& W\in H^1({\mathcal {X}}_0, {\mathbb {R}}^d) \text { global quasiconformal and } \varGamma _0=\partial {\mathcal {X}}_0\}. \end{aligned} \end{aligned}$$
(53)

The resulting space of \(H^{1/2}\)-shapes carries also a diffeological structure if \(\varGamma _0\) is the boundary of an NTA domain \({\mathcal {X}}_0\) due to Theorem 4.

Remark 14

A quasi-conformal mapping of the open d-ball \(B^{d-1}\) induces a homeomorphism on the boundary for \(d=2,3\) (cf. [20, 45]). In [46], this result is generalized for higher dimensions. More precisely, it is proven that a quasiconformal mapping of an open ball in \({\mathbb {R}}^d\) onto itself extends to a homeomorphism of the closed d-ball. If we apply these results to our shape space, we get injectivity and continuity in (53) for free for \(\varGamma _0:=S^{d-1}\).

Fig. 5
figure 5

Examples of non-Lipschitz shapes

4 Conclusion

The differential-geometric structure of the shape space \(B_e\) is applied to the theory of shape optimization problems. In particular, a Riemannian shape gradient and a Riemannian shape Hessian with respect to the Sobolev metric \(g^1\) is defined. The specification of the Riemannian shape Hessian requires the Riemannian connection, which is given and proven for the first Sobolev metric. It is outlined that we have to deal with surface formulations of shape derivatives if we consider the first Sobolev metric. In order to use the more attractive volume formulations, we consider the Steklov–Poincaré metrics \(g^S\) and state their connection to shape calculus by defining the shape gradient with respect to \(g^S\). The gradients with respect to both, \(g^1\) and \(g^S\), and the Riemannian shape Hessian, open the door to formulate optimization algorithms in \(B_e\). We formulate the gradient method in \((B_e,g^1)\) and \((B_e,g^S)\) as well as the Newton method in \((B_e,g^1)\). The implementation and investigation of Newton’s method in \((B_e,g^1)\) for an explicit example will be touched in future work. In particular, the comparison of Newton’s method in \((B_e,g^1)\) and \((B_e,g^S)\) will be investigated in the future. Here, a challenging question, which arises, is how an explicit formulation of the Riemannian shape Hessian with respect to the Steklov–Poincaré metric looks like. For this, we would generally need to work in fractional order Sobolev spaces and deal with the projected Poincaré–Steklov operator.

Since the shape space \(B_e\) limits the application of optimization techniques, we extend the definition of smooth shapes to \(H^{1/2}\)-shapes and define a novel shape space. It is shown that this space has a diffeological structure. In this context, we clarify the differences between manifolds and diffeological spaces. From a theoretical point of view, a diffeological space is very attractive in shape optimization. It can be supposed that a diffeological structure suffices for many differential-geometric tools used in shape optimization techniques. In particular, objects which are needed in optimization methods, e.g., retractions and vector transports, have to be deduced. Note that these objects cannot be defined so easily and additional work is required to formulate optimization methods on a diffeological space, which remain open for further research and will be touched in subsequent papers.