Sobolev Gradients for the M\"obius Energy

Aiming at optimizing the shape of closed embedded curves within prescribed isotopy classes, we use a gradient-based approach to approximate stationary points of the M\"obius energy. The gradients are computed with respect to Sobolev inner products similar to the $W^{3/2,2}$-inner product. This leads to optimization methods that are significantly more efficient and robust than standard techniques based on $L^2$-gradients.


Introduction
Let γ : T → R m be a sufficiently smooth embedding 1 of the circle T into Euclidean space. Its Möbius energy [27,61] is defined as |γ (x)| |γ (y)| dx dy, (1) where γ (x, y) denotes the length of the shortest arc of γ connecting γ(x) and γ(y). The original motivation [28] was to define an energy that measures complexity or "entangledness" of a given curve. One may expect that minimization will unravel the initial configuration to a state of less complexity. Ideally, this should also preserve topological properties, in particular the isotopy class. By definition, an isotopy class is a path component in the space of embedded curves. The Möbius energy was designed to erect infinite energy barriers that separate isotopy classes within the space of curves. The term |γ(x) − γ(y)| −2 blows up whenever a self-contact emerges, lending itself as contact barrier for modeling impermeability of curves (0) (5) (10)  Figure 1.: Discrete Sobolev gradient descent subject to edge length constraint and barycenter constraint. The isotopy class is maintained along the iteration which is the crucial feature of a knot energy. As initial condition, we use a "difficult" configuration proposed in [27] (1648 edges; numbers in parentheses indicate the iteration steps). The global minimizer (the round circle) is reached after about 200 iterations. The curves have perceived constant thickness in the plots while a coordinate cross serves as a reference for the respective scaling factor. See also Figure 3 for a comparison to further optimization methods; the present one is "W 3/2,2 projected gradient, explicit". and rods. Moreover, this term promotes the spreading of the geometry, which indeed leads to the desired unfurling. Subtracting the second term −2 γ (x, y) guarantees that the energy is finite for sufficiently smooth embeddings. This way, any time-continuous descent method like, e.g., a gradient flow, will necessarily preserve the isotopy class. 2 Another pleasant feature of the Möbius energy is that its critical points enjoy higher smoothness.
In this paper we propose a new concept of numerical optimization techniques for the large family of self-repulsive energies by discussing the prototypical case of the Möbius energy. Due to the nonlocal point-point interactions (which manifest themselves in the occurrence of a double integral), any evaluation of the energy or its gradient is rather expensive; this renders the numerical optimization a challenging task. The key idea of our approach is to introduce a special geometric variant of the metric of the Sobolev space W 3/2,2 that discourages movement of an embedded curve in regions of near self-contact. Contrary to black-box approaches, our method allows us to minimize the Möbius energy of even quite complicated starting configuration within only a few hundred iterations (see Figure 1 and Figure 6). As illustrated in Figure 2, computing gradients with respect to this metric allows for choosing significantly larger step sizes compared to the L 2 -or even the W 3/2,2 -metric. This is in agreement with the interpretation of W 3/2,2 -gradient descent as a coarse discretization of an ordinary differential equation. In contrast to full discretization (i.e., in space and time) of a general (transient) partial : We visualize different gradients as vector fields along a given curve. The L 2 -gradient is pathologically concentrated on regions of near self-contact. Consequently, one has to pick tiny step sizes to prevent self-collision. The pure W 3/2,2 -gradient behaves much better in the sense that it is more uniformly distributed along the curve. However, this can still be improved considerably by adding a lower order term to the inner product that discourages movement in regions of near self-contact, cf. Proposition 4.1.
differential equation, an ordinary differential equation does not require any mesh-dependent bound on the time step size for stability. Consequently, our gradient descent scheme requires only few iteration steps, even for fine spatial resolution. This makes it, besides from being robust, particularly efficient. This is demonstrated by the performance comparison in Figure 3. Potential applications for self-repulsive energies are manifold as they can be employed as barriers for shape optimization problems and physical simulation with self-contact: They arise, for instance, in mechanics [21,29,47,80,81,93] and in molecular biology [22,23,34,35,52,53]. The Möbius energy can also be considered as differentiable relaxation of curve thickness. For example, as reported in [82], the speed of migration of knotted DNA molecules undergoing gel electrophoresis seems to be proportional to the average crossing number of the corresponding maximizers of curve thickness. Software tools for the maximization of thickness or equivalently, for the minimization of ropelength, have been developed in [65] (SONO) and [2] (ridgerunner). Further potential fields of applications for repulsive energies include computer graphics [10,79], packing problems [30,31], the modeling of coiling and kinking of submarine communications cables [24,100], and even solar coronal structures [70].

Previous work
Since its invention by O'Hara [61,62,63] and the very influential paper by Freedman, He, and Wang [27], the Möbius energy has been studied by many authors. Detailed investigations on its derivatives have been performed in [17,37,42]. Existence of minimizers in prime knot classes has been established in [27]. Invariance of the energy under conformal transformations of R m has been studied in [3,27,49,56]. Smoothness of minimizers has been established in [27,37], while smoothness and even analyticity of all critical points has finally been shown in [17] and [18]. Except for the global minimizer [27] and first results on critical points in nontrivial prime knot classes [14,44], almost nothing is known on the geometry of the energy space. In light of the Smale conjecture (proven by Hatcher [36]), it would be of great interest to know whether some gradient flow of the Möbius energy actually defines a retract of the unknots to  Figure 3.: Exemplary performance comparison between several feasible (top) and infeasible (bottom) optimization methods and with respect to various Sobolev metrics, applied to the initial configuration from Figure 1 (1648 edges). "Feasible" means that the constraints were respected in each iteration step (up to a certain tolerance, of course). "Infeasible" means that a penalty formulation was used in place of hard constraints. Each dataset corresponds to a combination of an optimization method (encoded by line dashing) and a Sobolev-metric (encoded by color; e.g., green corresponds to our Sobolev metric) that have been employed to compute gradients. We see that apart from the implicit projected gradient descent (which generally does not work well in this context), all optimization methods perform best in conjuction with our W 3/2,2 -metric. All experiments were implemented in Mathematica ® and ran single-threaded for 60 minutes on an Intel ® Xeon ® E5-2690 v3. Further details will be provided in Section 6.4. the round circles. The L 2 -gradient flow of the Möbius energy has been studied in [12,13,37].
Both theoretical and numerical results have been obtained on linear combinations of the bending energy and the Möbius energy [51,61,95]. More generally, in order to find minimizers of an elastic energy within an isotopy class, each knot energy can be employed in two ways: either as regularizer as it was done, e.g., in [4,5,6,29,32,40,94,97]; or by using it to encode a hard bound into the domain, which was done with the knot thickness in [39,76,96].  Figure 3 highly rely on the hardware. To provide a more independent comparison, we here plot the number of iteration steps versus the values of the Möbius energy attained after that time. Of course, as the effort involved for performing a single iteration step differs among the methods discussed here, it is debatable whether this is a meaningful unit after all.
The applicability of self-avoiding energies is heavily limited by their immense cost: Typical discretizations replace the double integrals by double sums which leads to a computational complexity of at least Ω((N · m) 2 ) for evaluating the discrete Möbius energy and its derivative, where N · m is the number of degrees of freedom of the discretized geometry (e.g., the number of vertices of a polygonal line times the dimension of the ambient space). This issue can be mended by sophisticated kernel compression techniques, see [99]. In this article, however, we focus on another issue that is more related to mathematical optimization, namely the fact that, for N → ∞, the discretized optimization problems become increasingly ill-conditioned. It is well-known that the convergence rate of many gradient-based optimization methods (method of steepest descent, nonlinear conjugate gradient method, and also more sophisticated quasi-Newton methods like L-BFGS) is very sensitive to the condition number of the Hessian of the energy (at a minimum) on the one hand, and the inner product that is used to compute the gradients on the other hand. The Hessian of the Möbius energy is deeply related to the fractional Laplacian (−∆) 3/2 which is a differential operator of order three, cf. [37]. Thus the condition number of the discrete problem grows like O(h −3 ) where h denotes the typical length of an edge in the discretization. In practice, this results in a rapid increase of the number of optimization iterations to "reach the minimizer" when the discretization is refined (i.e., for h → 0). Combined with the immense cost of evaluating E and DE, this leads to a prohibitively high cost of minimizing E with black-box optimization routines (see Figure 3 and Figure 4).
In particular, this issue applies to the explicit Euler time discretization scheme for the L 2gradient flow of the Möbius energy. Denoting the discretized energy by E h , the next time iterate γ (t+∆t) is computed from the current iterate γ t by solving This can also be reinterpreted as method of steepest descent with respect to the (discretized) L 2 -gradient and with step size ∆t > 0. Here the ill-conditioning manifests itself in the Courant-Friedrichs-Lewy condition: As the L 2 -gradient flow is a system of third order parabolic partial differential equations, the step size has to be truncated to ∆t = O(h 3 ) in order to make this scheme stable. This is also why a line search that enforces the Armijo condition (also referred to as first Wolfe condition), cf. [59,Chapter 3], will typically lead to tiny step sizes, rendering the method impractical for optimization (see Figure 3). It is well-known that the Courant-Friedrichs-Lewy condition can be circumvented by implicit time integration schemes. For example, in the implicit Euler or backward Euler scheme, one determines the next iterate γ (t+∆t) by solving the equation Standard techniques for solving this nonlinear equation, e.g., Newton's method, require solving multiple linearizations of the above equation and thus involve the Hessian DDE h in each time iteration. Moreover, the linearization has to be recomputed whenever the step size ∆t changes, which makes it nontrivial to set up an adaptive time stepping scheme. This explains why implicit time integrators turn out to be rather inefficient optimization schemes (see Figure 3). If one allows oneself to employ second derivatives of E h , applying Newton's method (and its damped or regularized derivates) for solving DE h (γ * ) = 0 in the first place would lend itself as a more efficient optimization algorithm. However, it is well-known that Newton's method does not necessarily perform well when applied far away from critical points.

Sobolev gradients
These problems can be overcome by optimization methods based on Sobolev gradients which are defined in terms of a Sobolev metric G that is "natural" for the Möbius energy. Blatt [11] characterized the energy space of the Möbius energy E as W 1,∞ (T; R m ) ∩ W 3/2,2 (T; R m ), cf. Theorem 2.1. Here and in the following, W s,p denotes the Sobolev-Slobodeckiȋ space of functions with "s fractional derivatives in L p " if s Z and a conventional Sobolev space for s ∈ Z. This result points to the fact that DE is a nonlinear differential operator of order 2 · 3 2 = 3, which has already been observed by He [37]. So morally, a suitable inner product G should be of the form Then the G-gradient grad(E)| γ at γ can be defined by the following weak formulation: So, at least formally, the G-gradient satisfies the equation By a somewhat naive counting of fractional derivatives, the right hand side is a nonlinear differential operator of order zero. Hence there is a chance that grad(E)| γ resides in the same Banach space as γ so that grad(E) would be a vector field. Then the evolution equation would actually be an ordinary differential equation. Indeed, this turns out to be true and is part of our main result (see Theorem 1.2). This seems to imply that no Courant-Friedrichs-Lewy condition applies to the discretized problem, so that the number of gradient descent iterations "to reach the minimum" is quite insensitive to the mesh resolution. At least, this is what we observed in our experiments.
Since the inner product G involves a choice of a Riemannian metric on the parametrization domain (line element and Laplacian), it is even more natural to define a γ-dependent family γ → G γ of inner products. With the Riesz operator I| γ u G γ (u, ·), the G-gradient can then be expressed by There are plenty of possible choices for I. Most important is that I| γ is an elliptic pseudodifferential operator of order three. All compact perturbations of I that are positive-definite will lead to the operator with the same qualitative properties. In particular, we are not limited to the exact fractional Laplacian; this gives us the freedom to pick an I| γ that is computationally more amenable. Up to lower order terms, we design G such that it resembles the W 3/2,2 -Gagliardo inner product, replacing intrinsic distances by (the easier computable) secant distances (see Proposition 4.1). For a curve parametrized by arc length (i.e., |γ | = 1) and up to lower order terms, it reads where in case of a curve γ parameterized by arc length the lower-order terms are given by T w(y) dy and γ denotes the geodesic distance introduced in (6) below. Here the first summand is essentially the W 1/2,2 -Gagliardo inner product with the energy density as additional weight.
Indeed, even if γ is not parametrized by arc length, a more detailed analysis reveals that I| γ has (up to a constant) the same principal symbol as (−∆ γ ) 3/2 where ∆ γ is the Laplace-Beltrami operator with respect to the Riemannian metric on T induced by the embedding γ (see the proof of Proposition 4.1).

As Riemannian as you can get
The overarching idea behind all this is to consider (C, G) as a Riemannian manifold and E : C → R as a smooth function. Here C denotes a Banach manifold of immersed embedded curves which will be defined in (5) below. If grad(E) is a well-behaved vector field on C, various optimization techniques that work on Riemannian manifolds can be utilized to minimize E. This is actually a long standing dream of differential geometers: to apply Riemannian geometry to an infinite-dimensional space of shapes. Such Sobolev inner products and their geodesics have been studied from a geometrical point of view, e.g., in [7,8,55]. It has observed that W 1,2 -inner products work well in the numerical treatment of full dimensional elasticity and of membrane energies such as the area functional for surfaces or the length functional for curves [58,66,75]. Moreover, it is known that W 2,2 -inner products provide good preconditioning for bending energies such as Bernoulli's elastic energy of curves, Kirchhoff's thin shell energy, the Willmore energy and Helfrich-type energies [26,38,74,75]. Various standard optimization schemes (e.g, nonlinear conjugate gradient, Nesterov's accelerated gradient, L-BFGS, trust region) can be sped up significantly by using the "right" notion of gradient (see Figure 3 and Figure 4). This is because these methods exploit that the gradient field is (locally) Lipschitz continuous with respect to the employed metric. Alas, the story here is not that simple, because there is no Morrey embedding from the energy space W 3/2,2 (T; R m ) to W 1,∞ (T; R m ) and any open W 3/2,2 -neighborhood of an embedded arclength parametrized W 3/2,2 -curve may contain non-embedded curves or curves with vanishing or infinite derivative. Therefore, Fréchet differentiability of the Möbius energy could only be established with respect to the somewhat artificial W 3/2,2 ∩ W 1,∞ -topology [17]. This problem can be resolved by working in the slightly smaller Banach space X W 3/2+ν,p (T; R m ) with Figure 6.: Discrete Sobolev gradient descent as in Figure 1 within the nontrivial knot class 7 2 , using the method "W 3/2,2 projected gradient, explicit". The initial configuration has 3000 edges and was randomly generated with KnotPlot [71].
suitable ν > 0 and p ≥ 2. Then X embeds into C 1 and the configuration space is an open subset of C 1 . We construct the Riesz isomorphism I| γ as an elliptic pseudo-differential operator of order three, and we show in Proposition 4.1 that it gives rise to a generalized Riesz isomorphism J| γ : X → Y where Y W 3/2−ν,q (T; R m ), with the Hölder conjugate q (1 − 1/p) −1 of p. Notice that J| γ does no longer identify X with its dual space as X Y , thus Y X . So one of our major tasks (see Theorem 3.1) will be to establish that DE(γ) ∈ Y whenever γ ∈ C. Moreover, we show that DE is locally Lipschitz continuous as a mapping C → Y , leading to our first main result: Combined with the Picard-Lindelöff theorem, this statement guarantees the short-time existence of the gradient flow, both for the downward and the upward direction.
In Section 5, we deal also with equality constraints, i.e., with Banach submanifolds of the form M { γ ∈ C | Φ(γ) = 0 }, where Φ : C → N is a suitable submersion, namely the constraint of constant speed and vanishing barycenter, cf. (33), into a further Banach space N = W σ+ν,p (T; R) ⊕ R m . We formulate a linear saddle point system for determining the projected gradient grad M (E| M )| γ and analyze when the system is solvable. We perform the analysis for a concrete set of constraints (fixed barycenter and parametrization by arc length), but we also try to outline which steps have to be taken for more general constraints. Finally, Theorem 5.1 will establish our second main result: Invoking the Picard-Lindelöff theorem again, we conclude that both the downward and the upward gradient flows of E| M exist for short times.
The question of long-time existence is much more involved. Following the way paved by Knappmann et al. [45] for a subfamily of integral Menger curvature functionals, one may derive this property in the case of subcritical Hilbert spaces. These correspond to the functionals obtained by replacing the squares in (1) by powers α ∈ (2, 3). Due to the fact that the general case where p 2 seems to be "degenerate" analogously to the p-Laplacian it seems unclear whether long-time existence can be established also for the setting discussed in this article.

Future directions
The present study demonstrates the design of a minimization scheme being both robust and efficient which is based on a metric that is tailored to the structure of a geometric nonlocal functional modeling self-avoidance.
The general strategy outlined in this paper applies to a large range of functionals on curves and surfaces of arbitrary dimension and codimension. We stress the fact that the arguments given below mainly rely on analytical features of a functional defined on fractional Sobolev spaces rather than on geometric peculiarities, except for the metric itself which has to be chosen carefully depending on the respective problem.
Although the definition of the Möbius energy has been motivated by the electrostatic energy [62], it is admittedly not a physical quantity in the first place. However, it seems to be an appropriate candidate to demonstrate the general approach while avoiding too much technicalities as, from an analyst's perspective, it is the most elementary smooth knot energy.
Even more importantly, one may find minimizers of physical functionals such as, e.g., the bending energy or the Helfrich energy within prescribed isotopy classes by a regularization approach, cf. [29]. In this context one may choose the regularizer to be a smooth repulsive functional which approximates the (reciprocal) thickness such as the tangent-point potential which has been employed e.g. in [4]. In combination with the technique described in the present paper, one may greatly improve not only the performance but also the complexity of the objects (i.e., isotopy types) that can be dealt with.
The higher-dimensional case as well as the adaption of this technique to other functionals is work in progress [98].

Preliminaries
General notation Throughout, we let T { x ∈ R 2 | |x| = (2 π) −1 } be the round circle with a fixed orientation and normalized to have total length |T| = 1. We will make use of the identification T R/Z whenever convenient. Moreover, we write T 2 = T × T for the Cartesian product of the circle with itself and denote by π 1 : T 2 → T and π 2 : T 2 → T the Cartesian projections onto the first and second factor, respectively. We denote the canonical intrinsic distance function on T by and the canonical line measure by dx or dy. Each sufficiently smooth immersed embedding γ : T → R m induces a line element ω γ (x) and a unit tangent field τ γ via Moreover γ induces two further distance functions that we have to distinguish: The secant distance | γ|(x, y) |γ(x) − γ(y)| and the geodesic distance γ ; more precisely, where I γ (x, y) denotes the shortest arc that connects x and y. Since γ is immersed, d T and γ are equivalent. We point out that this equivalence extends to | γ| if the embedding γ is sufficiently smooth, e.g., of class In this case γ is bi-Lipschitz continuous and the measures dx and ω γ are equivalent as well, i.e., there are c 1 , c 2 > 0 such that c 1 dx ≤ ω γ (x) ≤ c 2 dx holds for all x ∈ T. This implies that also the Lebesgue norms for 1 ≤ p < ∞ and any measurable function u : T → R m are equivalent. We also employ this notation for bivariate measurable functions U : Likewise, for 0 < σ < 1 and 1 ≤ p < ∞, the Sobolev-Slobodeckiȋ seminorms In all what follows, we will frequently make use of the following γ-dependent measures and operators: For example, the γ-dependent Sobolev-Slobodeckiȋ seminorm can be written much more economically as denotes the Lebesgue space with respect to µ γ and · L p µγ its associated norm. We define W s,p -seminorms for 1 < s < 2 by concatenating the W s−1,p -seminorms with suitable differential operators of first order: Here, the differential operator D γ can be interpreted as derivative with respect to arc length. Provided that γ is a sufficiently smooth immersed embedding, u W s,p [u] W s,p + u L p and u W s,p More precisely, the norm · W s,p γ is well-defined and equivalent to · W s,p if γ is an immersed embedding of class W S ,P (T; R m ) provided that one of the conditions for the "product rule" Lemma A.4 are met for

Spaces
Our initial motivation to consider W 3/2,2 -inner products for optimization is the following characterization of the energy space of the Möbius energy, i.e., of the smallest space that contains all finite-energy configurations: Theorem 2.1 (Blatt [11]) Let γ ∈ W 1,∞ (T; R m ) be an embedded immersed curve parametrized by arc length, i.e., |γ (x)| = 1 for a.e. x. Then one has E(γ) < ∞ if and only if γ ∈ W 3/2,2 (T; R m ).
Moreover, provided that γ has a certain minimal regularity, the differential of E has been characterized as a nonlinear, nonlocal "differential operator" of order 3 in the sense that DE(γ) is a distribution with three derivatives less than γ (see [37]). We will see this also in Theorem 3.1 below. As indicated in the introduction, instead of working with the energy space W 3/2,2 (T; R m ) ∩ W 1,∞ (T; R m ), we prefer spaces of curves with slightly higher regularity. In the first place, we avoid some technicalities effected by the critical scaling of W 1/2,2 (see [54]) related to discontinuous tangents, in particular with respect to product rules. Here and in the following, we fix parameters s, ν, and p satisfying In fact, we will soon focus on the case s = 3 2 only. Moreover, we think of ν being close to 0 and of p being close to 2. By the Morrey embedding theorem [25, Theorem 6.5], the space (5) is well-defined and an open subset of W s+ν,p (T; R m ). We consider the Banach spaces where q (1 − 1/p) −1 denotes the Hölder conjugate of p. For γ ∈ C, we will equip these spaces with the norms Their continuous dual spaces will be denoted by X , H , and Y . Since C ⊂ X is an open set, its tangent space T γ C is identical to X for each γ ∈ C. By the Sobolev embedding theorem, the canonical embeddings are well-defined and continuous with dense images. We point out that H is a Hilbert space; suitable scalar products on this space will play a pivotal role in defining the Sobolev gradients of the Möbius energy (see Section 4).
There are several reasons for picking the parameters ν and p as in (9): So far, it is only clear that p ≥ 2 and ν ≥ 0 are necessary for the existence of the continuous embeddings i C and j C while s + ν − 1/p > 1 is necessary for the Morrey embedding C → W 1,∞ (T; R m ). In addition to that, we require ν > 0 in order to be able to use certain product rules for bilinear maps of the form B : W s+ν,p × W s−ν,q → W s−ν,q and B : W s+ν,p × W s,2 → W s,2 as discussed in Lemma A.4. Indeed, the requirements s + ν − 1/p > 1 and ν > 0 allow us to treat all occurring nonlinearities in a satisfactory way. The condition p < ∞ guarantees that all involved Banach spaces are reflexive and separable.

Energy
From now on, if not stated otherwise, we fix s = 3 2 and suppose that ν > 0 and p ≥ 2. Our principal aim in this section is to investigate the Möbius energy along with its first two derivatives. The first two variations of the Möbius energy have been discussed under various regularity assumptions before, cf. [17,37,42]. The first variation is typically given in terms of principal-value integrals. Here, by keeping everything in weak (or variational) formulation, we can work with very low regularity assumptions and avoid principal-value integrals altogether.
The following statements hold true: 3. The mapping Y E : C → Y is locally Lipschitz continuous.
Proof. We are going to show that the energy density E : C → L 1 γ (T 2 ; R) is Fréchet differentiable. This will also imply that E is Fréchet differentiable with derivative identical to the linear form X γ E ∈ X defined by We do so by following a "shoot first ask questions later" approach. To this end, we first investigate pointwise derivatives of E(γ). For k ∈ N 0 and u 1 , . . . , u k ∈ X, we abbreviate Recall that the W s+ν,p -norm dominates the C 1 -norm. Thus, due to the definition of the geodesic distance in (6), for each point (x, y) in the open set there is an open neighborhood U(x, y) of C such that γ → I γ is constant on U(x, y). Consequently, sufficiently small perturbations of γ do not affect the integration domain of G k (γ; · · · ). Utilizing the formulas we obtain By pointwise differentiation at (x, y) ∈ Σ and by observing that Σ has full measure, we are lead to the following identities which hold almost everywhere on T 2 : Claim 1 below will imply that F 1 (γ; u 1 ) is indeed a candidate for DE(γ) u 1 . Moreover, it guarantees that the right hand side of (13) makes sense even if one replaces u ∈ X by w ∈ Y so that X γ E ∈ X has a unique continuous extension to an element Y γ E ∈ Y .
holds for all u 1 ∈ X. We split F 1 as follows: The desired bound for the first summand is derived in Lemma 3.4. The second summand can be treated with Lemma 3.3 because it has the form 2 B α,β γ (γ, u) with α = 0 and β = 2. We would like to use the L 1 -norm of F 2 to bound remainder terms of Taylor expansions. This will make use of the following claim.
Claim 2: There exists a number Ξ(γ) > 0, continuous in γ, such that for all u 1 , u 2 ∈ X, we have F 2 (γ; We may split F 2 (γ; u 1 , u 2 ) into the following four summands: Here, (15) and (16) are again of the type discussed in Lemma 3.3, namely with α = 0, β = 4 and α = 0, β = 2, respectively. We may factorize (17) as to discover that we have discussed its second factor already. Up to a γ-dependent constant, its first factor is bounded by 2 D γ u 1 L ∞ , thus dominated by u 1 X,γ . With H I γ D γ u 1 , D γ u 2 ω γ , we can split (18) into the following two summands: The first summand of (19) can be treated with Lemma 3.4. With ϕ i D γ γ, D γ u i and the identities γ = I γ ω γ (s) and Now the same techniques as in Lemma 3.4 and the product rule Lemma A.4 imply Figure 7.: Illustration why handling γ → γ correctly is so difficult. (a) Two points γ(x) and γ(y) on a circular curve and the geodesic arc γ(I γ (x, y)) connecting them. (b) A displacement vector field u along γ. (c) The geodesic arc η(I η (x, y)) connecting η(x) and η(y) on the displaced curve η = γ + u. Observe that I γ (x, y) I η (x, y), so (x, y) belongs to the "bad" set V(u). (d) The points (x, y), (y, x) and the "bad" set V(u), plotted over the relief of γ in the parameterization domain T 2 .
which proves the claim.
holds for all sufficiently small u ∈ X. Because C ⊂ X is open and Ξ is continuous, we may find an ε > 0 such that for all u X,γ < ε we have η γ + u ∈ C and Ξ(η) ≤ 2 Ξ(γ). By shrinking ε if necessary, we may achieve that all the densities Ω η and the norms · X,η are equivalent for all such u, i.e., there are c > 0 and Λ > 1 such that For the remainder of the proof, we let u ∈ X be of length u X,γ < ε and abbreviate η γ + u and γ t γ + t u. Now we split the integration domain , a "bad" part V(u), and an "ugly" part, the diagonal of T 2 (cf. Figure 7). Since the latter is a null set, it may be neglected. The other two parts are defined as follows: On the "good" part, we may apply Taylor's theorem along with Claim 2 and (21) to obtain: Here the first estimate is only admissible on the "good" set U(u) as the intrinsic distance (6) is differentiable. However, we cannot argue this way on the "bad" set V(u). Instead, we observe that V(u) has positive distance from the diagonal. Thus, by Claim 4 below, γ → E(γ)| V(u) is Lipschitz continuous as a map into L ∞ (V(u); R). Together with Claim 5 below, which states that V(u) is a small set, the triangle inequality Claim 4: γ → γ is locally Lipschitz continuous as a mapping into L ∞ .
As T 2 \ Σ is a set of measure zero, we may restrict our attention to (x, y) ∈ Σ. For η γ + u and (x, y) ∈ Σ there are two cases: The first case is I η (x, y) = I γ (x, y). Then the bound | η (x, y) − γ (x, y)| ≤ C D γ u L ∞ follows from the differentiability of ω γ . The second case I η (x, y) I γ (x, y) is a bit more elaborate. We abbreviate I I γ (x, y) and denote its complement by J T \ I. With (20), we obtain Here we used (20) for the first and the third inequality. This shows Combining these latter two inequalities proves Claim 4: Claim 5: V(u) Ω γ ≤ C D γ u L ∞ for u X,γ < ε. A Taylor expansion of the integrand leads to Then there is a t ∈ [0, 1] such that I γ+t u (x, y) = J and we have Together with Claim 4 this implies that V(u) is contained in a narrow band around the set { (x, y) | γ (x, y) = } whose area is proportional to D γ u L ∞ (cf. Figure 7d).
By the triangle inequality and the fundamental theorem of calculus, we may bound |(Y η E) w − (Y γ E) w| from above by Recall that on the "good" set U(u) we may interchange differentiation and integration. Hence, we have Now it follows from the other claims above that (22) is bounded by a multiple of u X,γ w Y,γ . For (23), we exploit that V(u) has finite, positive distance to the diagonal of T 2 : This implies that the quantities | γ(x, y)| and γ (x, y) are uniformly bounded away from zero and that this remains true for sufficiently small perturbations η = γ + u of γ. This is why we can express with Lipschitz continuous functions Z 1 and Z 2 . So the same applies to η, and Claim 5 shows that (23) is bounded by which finally proves the claim.
Remark 3.2 In fact, a bit more is true: The mapping Y E : C → Y is even continuously Fréchet differentiable and what we have shown in Claim 6 above is that D(Y E)(γ) : X×Y → R is a continuous bilinear form. Now we may conclude that its second derivative must satisfy D 2 E(γ)(u 1 , u 2 ) = D(Y E)(γ)(u 1 , j C i C u 2 ) for u 1 , u 2 ∈ X where i C and j C denote the canonical embeddings defined in (11). Although this might be relevant for optimization methods based on Newton's method and also for the implicit integration of the L 2 -gradient flow, we do not dive into details here.

Details
Here we state and prove the lemmas used in the proof of Theorem 3.1 above. The following is our main tool for dealing with the lower order terms that occur in Y γ E.
Let γ ∈ C be an embedded curve and let b : Then B α,β γ is well-defined and there is a continuous function Ξ : Here the expression b denotes the operator norm of the multilinear form b.
In this case, B α,β γ only depends on ψ.
By Fubini's theorem, the L 1 γ -norm of (24) is bounded by We employ the Hölder inequality to obtain an upper bound for this integral: For s = 3 2 ,s q 2 p (p − 2 + 2 p ν) −1 ≥ 1, we have a Sobolev embedding W s−1−ν,q → L˜q due to (9), see Lemma A.3, thus D γ ψ Lq γ ≤ C(γ) ψ W s−ν,q γ with C(γ) depending continuously on γ. The Hölder conjugate ofq isp 2 p (p + 2 − 2 p ν) −1 . Thus, we obtain the following upper bound: Exploiting that η (·) (θ i X) : T → T are isometries with respect to γ , and utilizing the substitution Y (θ 2 − θ 3 ) X, we may compute as follows: Here · B,γ is a natural, γ-dependent norm on the Besov space B s−1 2p,2 (T; R m ). This shows that The next statement allows us to handle the principal order terms of Y γ E.
Lemma 3.4 Suppose s = 3 2 and (9) with 1 p + 1 q = 1. Let I γ (x, y) ⊂ T denote a shortest arc with respect to γ that connects x and y. Then the bilinear form B γ : X × Y → R given by is well-defined and bounded. More precisely, we have |B γ (u, w)| ≤ [u] X,γ [w] Y,γ .
Proof. Using the following two identities we obtain Now we apply the same technique as in (25): Utilizing the notation of Lemma 3.3 and the substitutions y = η x (X), s = η x (θ 1 X), t = η x (θ 2 X), and Y = (θ 1 − θ 2 ) X, we arrive at: Thus, we can bound |B γ (u, w)| from above by

Metrics and Riesz isomorphisms
Next to the differential of the energy, the second ingredient that one requires for defining a gradient is a Riemannian metric on the configuration space C that has been introduced in (5). Below, we pick a suitable inner product on H which is essentially a geometric version of the W s,2 -Gagliardo inner product G from the introduction (see (4)). Throughout, we represent this inner product at the point γ ∈ C only by its Riesz isomorphism I C | γ : H → H . If I C | γ identified the tangent space T γ C of C at γ with the cotangent space, i.e., its dual T γ C, we could define the gradient of E by Alas, T γ C = X is not a Hilbert space for its Hilbert space completion (with respect to I C ) is H X, so that there cannot be any linear isomorphism T γ C → T γ C = X induced by a positive-definite bilinear form. However, we have already seen in Theorem 3.1 that DE(γ) can be interpreted as an element Y γ E in the smaller space Y ⊂ X , and that γ → Y γ E is locally Lipschitz continuous. Thus, Theorem 1.1 is proven as soon as we show that I C | γ induces an isomorphism J C | γ : X → Y which depends locally Lipschitz continuously on γ. This is our goal in this section.
Depending on context, we use ·, · for the dual pairing of a Banach space with its dual, e.g., ·, · : H × H → R and ·, · : Y × Y → R, as well as the Euclidean inner product on R m . Note that the canonical embeddings i C : X → H and j C : H → Y introduced in (11) give rise to dual maps i C : H → X and j C : Y → H . Proposition 4.1 For each γ ∈ C, with σ s − 1 = 1 2 and the notation from (7), (8), and (9), we define I C | γ : H → H and J C | γ : X → Y as follows: T w ω γ for v 1 , v 2 ∈ H, u ∈ X, and w ∈ Y. These operators are well-defined, continuously invertible, and they make the following diagram commutative: Moreover, the mappings I C : C → L(H; H ), γ → I C | γ , and J C : C → L(X; Y ), γ → J C | γ , are of class C 1 and hence locally Lipschitz continuous.
Here and in the following, a doubly headed arrow in the diagram (27) above indicates a linear operator with dense image.
Invertibility: We show this only for J C | γ , as the argument for I C | γ is analogous. First we observe that J C | γ is injective. Indeed, let u ∈ ker(J C | γ ). Then µγ + | T u ω γ | 2 implies that D γ u must be constant and that the mean value of u vanishes. But the first condition can only hold if u is already constant and the second one forces this constant to be zero. So it suffices to show that J C | γ is a Fredholm operator of index 0. To this end, we define the operator A γ : X → Y by Now we observe that A γ u, w = T 2 D σ+ν γ D γ u, D σ−ν γ D γ u dµ γ and that 3 we may conclude that A γ is a compact perturbation of J C | γ . So it suffices to show that A γ is a Fredholm operator of index 0. As this is a bit more involved, we defer this to Lemma 4.2 below.
Fréchet differentiability: This can be shown by utilizing basically the same technique as in the proof of the Fréchet differentiability of the energy: first order Taylor expansion of the integrand around the point γ and bounding the integral of the second order remainder term (see Claim 3 in Theorem 3.1). In fact, the analysis here is bit easier for the geodesic distance γ does not appear in the definitions of I C and J C . For the sake of brevity, we omit the details.

Details
The remainder of this section is devoted to proving the following lemma. The employed techniques are fairly standard in the area of pseudo-differential operators; only the fact that the differential operators involved here are nonlocal introduces a couple of further technicalities. Throughout, we will suppose that γ ∈ C. Moreover, we will denote by K : X → Z a generic compact operator that-pretty much like the ever-expanding "constant" C-may change from line to line.
Since the operator j C i C : X → Y is injective and has dense image, this also implies that A γ + M γ has dense image. By virtue of the Schauder lemma (see Lemma A.5), it suffices to establish an elliptic estimate of the form u X ≤ C (A γ + M γ ) u Y + K u Z for all u ∈ X with suitable norms · X and · Y that will be defined soon. Indeed, we will show in Lemma 4.6 that Since M γ is compact, this implies the previous inequality (for different C, K and Z).
Moreover, we use following abbreviations for the global Sobolev spaces and their norms: Via localization techniques, we will compare A γ to L :X →Ỹ given by Compare this to the weak formulation of the fractional Laplacian which is given by for some C σ > 0 (see e.g., [50, Theorem 1.1]). So up to a constant, we have L = D (−∆) σ D, hence L is a pseudo-differential operator of order 2 σ + 2 = 2s = 3 and its principal symbol [92, 2.3.8], the operator L is continuously invertible. 3 The following two lemmas will help us in localizing Sobolev norms: Their proofs are quite similar, so we show only the proof of the latter for it contains an additional difficulty.
Proof. (of Lemma 4.4) Since supp(ṽ) ⊂Ṽ ⊂Ũ, we obviously have 3 In the source it is shown that the operator (id −∆) σ : B t p,q → B t−2σ p,q is continuously invertible for general Besov spaces B t p,q . Note that W t,p = B t p,p for t ∈ R + \ Z and p ∈ [1, ∞). Moreover, (−∆) σ a compact perturbation of (id −∆) σ , so (−∆) σ is a Fredholm operator of index zero; being also positive-definite, it must be continuously invertible.
The following identity is caused by the nonlocality of D α and it is easily overlooked: Notice that we used here that supp(Dṽ) ⊂Ṽ. We assumed thatṼ ⊂⊂Ũ is relatively compact, so there is an R > 0 such that |x − y| ≥ R for all x ∈ R \Ũ and y ∈Ṽ. Because of 1 + α r > 1, which concludes the proof.
Now we can start to show (29), at least for functions u with small support.
Proof. We start by picking an isometric coordinate system around the point a ∈ T. With L T ω γ denoting the curve length, we choose U B L/4 (a), V B L/8 (a), and W B L/16 (a), where these balls are meant with respect to the intrinsic distance γ . We point out that U is geodesically convex, i.e., each shortest arc between two points in U is contained in U.  ± γ (a, x), where the sign depends on whether the shortest curve from a to x is oriented positively (+) or negatively (−) with respect to the standard orientation of T. Let u ∈ X be a function with supp(u) ⊂ W. Then we can find a unique functionũ ∈X with supp(ũ) ⊂W and u =ũ • f . 4 Because f is an isometry, we have u W s+ν,p ,γ (U) = ũ W s+ν,p (Ũ) . From Lemma 4.3 and from the continuous invertibility of L :X →Ỹ, we deduce that there is a C ≥ 0 (depending only on γ, s + ν, p, U, and W) such that Our next goal is to control Lũ Ỹ by A γ u Y modulo a compact operator. To this end, we choose a bump function η ∈ W s+ν,p (T; R) with values in [0, 1] that satisfies η(x) = 1 for all x ∈W and supp(η) ⊂ V. We denote byη ∈ W s+ν,p (R; R) the unique functionη • f = η and supp(η) ⊂Ṽ. These functions induce the following multiplication operators: Because of H u = u andHũ =ũ, we may split A γ u and Lũ into Claim: The operators Q A γ H − H A γ andQ LH −H L are compact. For 0 < α < 1 and v : T → R m , the Leibniz rule implies where π 1 , π 2 : T 2 → T are the projections given by π 1 (x, y) = x and π 2 (x, y) = y. This allows us to write with compact operators K 1 : X → L q µ γ (T 2 ; R m )) and K 2 : Y → L q µ γ (T 2 ; R m ). Now we have This shows the first statement. The second statement is proven analogously.
From this claim it follows that the "leading term" of Lũ isH Lũ. Next we are going to bound H Lũ Ỹ . For this it suffices to testH Lũ against only suchw ∈Ỹ with supp(w) ⊂ V. More precisely, we have For every suchw there is a w ∈ Y with supp(w) ⊂ V such that w =w • f . Since alsoũ andη are constructed in this way from u and η, we have By Lemma 4.3 and Lemma 4.4, there is a C ≥ 0 (depending only on γ, s − ν, q, U, W,Ũ, and Combined with (32), this leads to Again by Lemma 4.3 and Lemma 4.4, the mapping u → u = Hu →ũ is continuous. SinceQ is compact, the mapping u →Qũ is compact as well. This concludes the proof.
Finally, we pieces together the local estimates from above.
Lemma 4.6 (Global elliptic estimate) Let γ ∈ C. Then there are C > 0 and a compact, linear operator K : X → Z into some Banach space Z such that Proof. We start by covering T by finitely many open sets W i ⊂⊂ U i ⊂ T, i = 1, . . . , k, as in Lemma 4.5 such that there are local elliptic estimates of the form with suitable constants C i ≥ 0 and suitable compact operators K i : X → Z i into Banach spaces Z i . Now we pick a smooth partition of unity { ϕ 1 , . . . , ϕ k } ⊂ C ∞ (T; [0, 1]) subordinate to { W 1 , . . . , W k }. We denote the corresponding multiplication operators by Φ i : X → X, With the triangle inequality, we obtain The separate claim in the proof of Lemma 4.5 shows that this concludes the proof.

Constraints
Our aim in this section is to set up constraints on the barycenter and on the parametrization of curves and to show Theorem 1.2, i.e., well-definedness of the associated projected gradient and its flow.
Here, as before, we abbreviated σ s − 1 = 1 2 . By the choice of ν and p (see (9)), we have C ⊂ W 1+σ+ν,p (T; R m ) ⊂ W 1,∞ (T; R m ). Hence for each γ ∈ C, the functions x → |γ (x)| and x → |γ (x)| −1 are both members of W σ+ν,p (T; R) → L ∞ (T; R). By the chain rule Lemma A.1, the following mapping is well-defined: A curve γ ∈ C is parametrized by constant speed L and has 0 as barycenter if and only if Φ(γ) = (0, 0). Our main task is to prove Theorem 5.1 below; it states that the feasible set equipped with a generalized Riesz isomorphism inherited from J C is almost a Riemannian manifold, at least in view of the projected or intrinsic gradients. Theorem 1.2 will follow from this immediately.
To this end (and in analogy to the space triple X, H, and Y), we introduce the Banach space triple XN W σ+ν,p (T; R) ⊕ R m , HN W σ,2 (T; R) ⊕ R m , YN W σ−ν,q (T; R) ⊕ R m and the continuous dense injections i N : XN → HN and j N : HN → YN. A straightforward computation shows that Φ is differentiable and that its derivative DΦ(γ) : X → XN is given by Lemma A.4 implies that u → D γ γ, D γ u induces well-defined and continuous linear operators X → XN, H → HN, and Y → YN, provided that ν > 0, and p ≥ 2. With the usual convention X γ Φ DΦ(γ) etc. we generate a triple (X γ Φ, H γ Φ, Y γ Φ) of continuous, linear operators that makes the following diagram commutative: By Lemma 5.2, the mapping Φ is a submersion. Thus the implicit function theorem implies that the set M is a Banach submanifold of C. For γ ∈ M and Z ∈ { X , H , Y }, define . Its well-definedness is established by the following theorem which states that M has a "nearly Riemannian structure". Note that M cannot support a Riemannian structure because the tangent space T γ M is not Hilbertable in the sense that there is no positive-definite bilinear form whose norm topologizes T γ M. Observe that The Galerkin projection of a Hilbert space's Riesz isomorphism onto a closed subspace equals the Riesz isomorphism of the restricted scalar product. So the invertibility of I M is straight-forward. The nontrivial part here is to show that J M is continuously invertible. By the open mapping theorem, it suffices to show that J M is both injective and surjective. Injectivity can be deduced from the injectivity of I C , H γ F, and i M as follows: Let u ∈ ker(J M | γ ) and put v H γ F i M | γ u. Now the following shows that u = 0: In order to establish surjectivity of J M , we fix an arbitrary η ∈ Y γ M. By the Hahn-Banach theorem, the mapping Y γ F : Y → Y γ M is surjective. Thus there is an η 0 ∈ Y with Y γ F η 0 = η. By Lemma 5.4 below, the saddle point problem has a unique solution with u 0 ∈ X and λ 0 ∈ Y N. In particular, we have u 0 ∈ ker(X γ Φ), hence we may write u 0 = X γ F u with u ∈ X γ M. Thus, we have

Details
Now, statements and proofs of the auxiliary results are in order.

Lemma 5.2 (Right inverse)
The triple (X γ Φ, H γ Φ, Y γ Φ) induced by the derivative DΦ(γ) allows a triple (X γ B, H γ B, Y γ B) of continuous right inverses such that the following diagram commutes: Moreover (X γ B, H γ B, Y γ B) depend smoothly on γ and in particular, they are locally Lipschitz continuous.
Proof. Denote by pr ⊥ γ (x) id R m −τ γ (x)⊗ τ γ (x), · the orthogonal projector onto the orthogonal complement of τ γ (x). Fix a Z ∈ { X , H , Y } and a (ξ, U) ∈ ZN. Denote the length of γ by L = T ω γ . Fix a given point y 0 ∈ T and define v(y) y y 0 ) ω γ (x) with a vectorŨ ∈ R m to be determined later. This way, the components of DΦ(γ) u stated in (34) amount to Using the product rule Lemma A.4 (for which p ≥ 2 is crucial here), we see that u is a member of Z, provided that we can find aŨ ∈ R m such that u becomes continuous at y = y 0 . For this, it is necessary and sufficient that T τ γ ξ + pr ⊥ γŨ ω γ = 0. Define the vector b γ (ξ) ∈ R m and the symmetric matrix Θ γ ∈ End(R m ) by b γ (ξ) T τ γ ξ ω γ and Θ γ T pr ⊥ γ ω γ . Assume Θ γ is not invertible. Then there is a unit vector V ∈ R m in its kernel and we have 0 = V, Θ γ V = T |V| 2 − τ γ , V 2 ω γ . But that means that τ γ (x) = ±V has to hold for almost every x ∈ T. Since τ γ is continuous, this implies that γ is a straight line, which is impossible due to γ being closed. This contradicts our assumption and thus Θ γ must be invertible. So we may chooseŨ −Θ −1 γ b γ (ξ) and put Z γ B (ξ, U) u. By (38), Z γ B is indeed a right inverse of Z γ Φ. Finally, it is only a matter of some elementary calculus to show that Z γ Φ and Z γ B depend smoothly on γ.
In analogy to Proposition 4.1, we may equip the target space N X N with the following Riesz isomorphisms. This will help us to generalize the concept of adjoint operators between Hilbert spaces. Proposition 5.3 Analogously to Proposition 4.1, we define the γ-dependent, linear operators I N | γ : HN → H N and J N | γ : XN → Y N as follows: is continuously invertible.
Proof. Let B be as in Lemma 5.2 above. As J C is invertible, the saddle point matrix A| γ is invertible if and only if its Schur complement S −X γ Φ (J C | γ ) −1 Y γ Φ is invertible. In analogy to the adjoint operators H * γ Φ = (I C | γ ) −1 H γ Φ (I N | γ ) and H * γ B = (I N | γ ) −1 H γ B (I C | γ ), we introduce the generalized adjoint operators Observe that we may express the Schur complement as Utilizing the identities j C J C = I C i C and j N J N = I N i N as well as the diagram (35), one verifies that This shows that X γ Φ X * γ Φ is injective. By Lemma 5.5 below; the operator X * γ B X γ B is invertible. This allows us to define the projector Q Finally, the open mapping theorem implies that X γ Φ X * γ Φ is continuously invertible.
Lemma 5.5 (Invertibility of B * B) For each γ ∈ C, the linear operator X * γ B X γ B is continuously invertible.
Thus it suffices to show that T is invertible. Letξ (ξ, U) ∈ XN andψ (ψ, W) ∈ YN. Put u X γ Bξ and w Y γ Bψ. By construction, we have With the notation from the proof of Lemma 5.2, we putŨ −Θ −1 γ b γ (ξ) andW −Θ −1 γ b γ (ψ). Now we observe that Writing only the terms of highest order in ξ and ψ, we obtain The latter pairing is identical to J N | γξ ,ψ up to the term T ξ ψ ω γ + U, W , which is a combination of lower order and finite rank, thus represents a compact operator XN → Y N. This means that T is a compact perturbation of J N | γ and thus a Fredholm operator of index 0.
Hence it suffices to show that T is injective. Letξ ∈ ker(T ) and putη i Nξ . A diagram chase in (37) and (27) yields Since H γ B is injective and since I C | γ ·, · is a scalar product on H, this implies i Nξ =η = 0. The injectivity of i N yieldsξ = 0 and we see that T is injective. So as an injective Fredholm operator with index zero, T must also be surjective, hence continuously invertible by the open mapping theorem.
The invertibility of the saddle point matrix leads to the following generalizations of (i) the Moore-Penrose pseudoinverse of a surjective operator between Hilbert spaces and (ii) the orthoprojector onto the orthogonal complement of the operator's null space. Being able to reduce the action of these operators to solving a linear saddle point system will be crucial for applications (see Section 6).
For ξ ∈ XN andũ ∈ X, the operators can be evaluated by solving the following saddle point systems where λ, µ ∈ Y N act as Lagrange multipliers.

Computational treatment
For the ease of use, we discretize curves by polygonal lines and approximate the Möbius energy and the Riesz isomorphisms from Proposition 4.1 by simple quadrature rules. In the language of finite element analysis, we employ a nonconforming Ritz-Galerkin scheme because the discrete ansatz space is not a subset of the smooth configuration space. We try to outline a discrete setting that can be applied also to more general self-avoiding energies; therefore, we do not care about Möbius-invariance of the energy, although Möbius-invariant discretizations have already been proposed (see e.g., [49] and [15,16]).

Spatial discretization
Let T denote a partition of T with vertex set V(T ) ⊂ T and edge set E(T ) ⊂ V(T ) × V(T ). Denote the number of edges by N. If the partition is sufficiently fine, i.e., h(T ) max I∈E(T ) |I| is sufficiently small, then we may identify each edge with the closed, oriented interval connecting its end vertices. For an edge I ∈ E(T ), we denote by I ↓ ∈ V(T ) and I ↑ ∈ V(T ) its backward and forward boundary vertex, respectively. Let P : V(T ) → R m be an embedded polygon in R m , i.e., there is a piecewise linear embedding γ : T → R m such that γ| V(T ) = P and such that γ maps I affinely onto the line segment connecting P(I ↓ ) to P(I ↑ ). We denote by C T the set of such embedded polygons which is an open set in the space of all closed polygons with N edges. Since the latter is finite dimensional and isomorphic to (R m ) N , we have X T = H T = Y T (R m ) N . Likewise, we discretize the target spaces by By P (I) |P(I ↓ ) − P(I ↑ )|, we denote the edge length of edge I.

Discrete energy
There are several possibilities to discretize the Möbius energy E. A very general approach employs simple quadrature rules and works for reparametrization-invariant energies F of the form F (γ) = T 2 F(γ) Ω γ with some energy density F(γ) : T 2 → R. If, for a sufficiently smooth curve γ, the integrand F(γ) is not too singular around the diagonal of the integration domain T 2 , we have Typically, the right hand side makes sense also if γ is a polygonal line. Indeed, cutting out the diagonal is somewhat necessary: An elegant scaling argument in [89, Figure 2 So with a k-point quadrature rule t 1 , . . . , t k ∈ [0, 1], ω 1 , . . . , ω k ∈ R, we may discretize F by F T (P) Ī ∩J=∅ W I J (P) with the local contributions Applying this with k = 1 to F = E from (12), one is naturally lead to the vertex energy (t 1 = 0, ω 1 = 1) and to the edge energy (t 1 = 1/2, ω 1 = 1) as proposed by Kusner and Sullivan in [48]. Scholtes proved in [73] that the vertex energy for equilateral polygons Γ-converges towards E under refinement of partitions, i.e., for h(T ) → 0, with respect to the W k,q -topology, k ∈ { 0, 1 }, q ∈ [1, ∞]. Roughly speaking, Γ-convergence implies that cluster points of minimizers of the discrete energies are minimizers of E. This result justifies the quite harsh variational crimes that one commits by choosing polygonal lines as discrete configurations. Although it is restricted to equilateral polygons (which was one of the reasons for us to include the edge length constraint), we deem it likely that it can be extended to non-equilateral polygons with a uniform bound on max P /min P as h → 0. At least our experiments indicate that the precise distributions of edge lengths does not matter. We require also the derivative of the discrete energy. Similarly as in Section 3, the explicit dependence of E on the geodesic distance γ causes problems: Without taking further measures, this would lead to the very high complexity of Ω(N 3 ) to assemble the derivative DE T (P) for the vertex energy and edge energy. 5 This can be circumvented by utilizing the identity which was derived by Ishizeki and Nagasawa in [41]. For the sake of efficiency, we discretize with the midpoint rule, i.e., with k = 1, t 1 = 1/2, and ω 1 = 1. For this F, the local contributions W I J (P) depend only on the coordinates of the four points P(I ↓ ), P(I ↑ ), P(J ↓ ), and P(J ↑ ). So the expression of the first and second derivative of W I J with respect to these four points can once be computed symbolically and compiled into runtime-efficient libraries. The first and second derivative of F T can then be assembled from DW I J (P) and D 2 W I J (P) as a vector and a matrix of size m N and (m N) × (m N), respectively. Due to the nonlocal nature of the energy, the matrix D 2 F (P) is dense.

Discrete inner product
Next we discretize the inner product I C from Proposition 4. The first two terms of I C u, u can now be discretized as follows: I∩J=∅ P (I) P (J) u I (I ↑ )−u I (I ↓ ) where we employ the same quadrature rule as for the discrete Möbius energy. In the presence of a barycenter constraint, we may simply omit the term T u ω γ , T u ω γ without loosing definiteness of the inner product on ker(DΦ T (P)). By virtue of the polarization formula, this defines the Gram matrix uniquely, leading to discrete bilinear forms G P = I C T | P = J C T | P . The local matrices are of size (4 m) × (4 m) (m coordinates for each of the four vertices belonging to the edge pair (I, J)). They can be computed in parallel and added into the global matrix afterwards. The resulting global Gram matrix is a dense matrix of size (m N) × (m N). 6

Discrete constraints
As for the constraints, we discretize Φ by Φ T (P) log( P (I)) − log( 0 (I)) I∈E(T ) , I∈E(T ) 1 2 P (I) (P(I ↑ ) + P(I ↓ )) , where 0 : E(T ) → ]0, ∞[ is a prescribed distribution of desired edge lengths, for example 0 (I) = L |I|. Although restoring feasibility for the edge length constraint comes at a certain cost, it prevents edges from collapsing to points and from being overstretched in the course of optimization. The latter is crucial since the discrete energy is not exactly self-avoiding; it becomes singular only if quadrature points approach each other. So overstretched edges make it more likely that the curve tries to form a self-intersection.
Some care should be given to the choice of the target edge lengths 0 . A coarse mesh may not be sufficient to preclude self-intersections, a very fine mesh is expensive as the computational effort grows quadratically in the number of nodes. As a rule of thumb, the distance between two neighboring vertices of a polygon should be strictly smaller than the distance between any other pairs of vertices. In principle, it is also possible to drop the edge length constraints; instead one could introduce a global length constraint and one could handle short and long edges by adaptive edge split and edge collapse strategies. We refrained from opting for this route here for the sake of simplicity.

Projected gradient
Once the vector Y P E T = DE T (P), and the matrices I C T | P and X P Φ T = Y P Φ T = DΦ T (P) have been assembled, the projected gradient u grad M T (E T | M T )| P can be obtained by solving the following discrete analogue of the linear saddle point system (36): We assemble the saddle point matrix as a dense, symmetric matrix with (N m + N + m) rows, and solve it via a dense LU-factorization. Hence it costs roughly O(N 2 m 2 ) for the assembly and a further O(N 3 m 3 ) for the factorization. It is not surprising that this is the most expensive part in the overall optimization process. We would like to point out that this can be sped up considerably by more sophisticated methods: The assembly of the saddle point matrix can be avoided by assembling DΦ T (P) as a sparse matrix and by compressing J C T | P in a hierarchical matrix data structure that is efficient for fast matrix-vector multiplication. Similar techniques can be employed to approximate E T (P) and DE T (P) in subquadratic time, but all this is beyond the scope of the present work.

Restoring feasibility and time step size rules
Suppose that Φ T (P) = 0 and that u is a feasible search direction, i.e., DΦ T (P) u = 0. The constraint mapping Φ T is Lipschitz continuously differentiable. Hence provided that the step size τ > 0 is sufficiently small, the modified Newton method converges quickly to a point Q ∞ that satisfies Φ T (Q ∞ ) = 0. Here DΦ T (P) † denotes the Moore-Penrose pseudoinverse with respect to the inner product G P and we utilize Corollary 5.6 to evaluate it. 7 For a given descending direction u, we may apply backtracking line search to find a suitable step size τ > 0: If the residual Φ T (Q i ) is smaller than a prescribed tolerance after a small, prescribed number of iterations, then the point Q i may serve as the next iterate of the optimization method. Otherwise we shrink τ and restart the modified Newton method. By shrinking τ even further, if necessary, we can also achieve that Q i satisfies the Armijo condition E T (Q i ) ≤ E T (P) + (τ/2) DE T (P) u. An initial guess for τ can be obtained, e.g., by collision detection (see, e.g., [69]): One determines the smallest step size τ * such that P + τ * u has a self-intersection and starts the backtracking procedure with, e.g., τ = 2 3 τ * . By utilizing suitable space partitioning data structures, this collision detection can be performed in subquadratic time. However, we simply cycled over all O(N 2 ) edge pairs because its runtime is proportional to the runtime of DE T (P). Figure 3 Feasible methods Projected L 2 -, W 1,2 -, W 3/2,2 -, and W 2,2 -flows were simulated both with explicit and implicit time integration schemes. We followed the approach above, only replacing J C T | P by the Riesz operator corresponding to the particular choice of metric. Armijo backtracking line search automatically determines a stable step size. For the implicit integration of the L 2 -gradient flow, we employ the backward Euler method. Since it is not unconditionally stable, Armijo backtracking has to be employed also here. Because backtracking requires the implicit equations to be solved again, this is particularly expensive.

Optimization methods employed in
The employed trust region method is a blend of the method from [20] with the two-dimensional subspace method from [77] (without computing the lowest eigenvalues): The next iterate is found by minimizing a quadratic model in a trust region within a low-dimensional subspace spanned by the current projected gradient, the projection of the previous gradient onto the current tangent space, and the Newton search direction -provided the current projected gradient is shorter than a given threshold. This means that the optimization is mostly driven by gradient and momentum; and the Hessian is utilized only in the end phase of optimization. Shrinkage and expansion of the trust region is handled as usual, but the radius is of course to be interpreted with respect to the employed inner product.
Infeasible methods In order to compare also to unconstrained optimization methods, we applied them to an analogous discretization of the penalized energy E α (γ) E(γ) + α Φ(γ) 2 L 2 = E(γ) + α T log(|γ (t)|/L) 2 L dt, whose penalty can be interpreted as Hencky's stretch energy. The optimization methods were made aware of this penalty by using the metric J C | γ + α DΦ(γ) I L 2 DΦ(γ) to compute gradients, where I L 2 denotes the Riesz operator of L 2 (T; R). 8 As nonlinear conjugate gradient method, we employed the Polak-Ribière method "with automatic reset" (method PR + in [59, Section 5.2]). L-BFGS was implemented with history length 30 and as described in [59,Section 7.2]. The only difference is that we replace the initial guess for the inverse Hessian by the inverse of the current metric (because using a single initial guess turned out to be less efficient). 9 As for Nesterov's accelerated gradient method (acc. grad.), we followed [57], but added collision detection to truncate the step sizes (in both steps of the method). Moreover, as suggested in [60], we reset the momentum to 0 whenever an increase of the objective was observed. 10 All these methods were complemented with a line search that tries to find a weak Wolfe-Powell step size.

A. Auxiliaries
We require some technical results for Sobolev-Slobodeckiȋ spaces on the circle T. Typically, such statements are formulated on R n or for sufficiently smooth domains Ω ⊂ R n , but standard techniques allow one to port them also to smooth manifolds such as T. Proofs for the following two results can be found, e.g., in [72, Theorem 5.3.6/1 (ii)] and [91,2.8.2,Eq. (19)].
Proof. We follow the argumentation in Theorem 5.1 from [1]: The norm ψ · X is Lipschitz with Lipschitz constant 1. Utilizing the Sobolev embedding for scalar functions (Lemma A.2)