Total Variation and Mean Curvature PDEs on the Homogeneous Space of Positions and Orientations

Two key ideas have greatly improved techniques for image enhancement and denoising: the lifting of image data to multi-orientation distributions and the application of nonlinear PDEs such as total variation flow (TVF) and mean curvature flow (MCF). These two ideas were recently combined by Chambolle and Pock (for TVF) and Citti et al. (for MCF) for two-dimensional images. In this work, we extend their approach to enhance and denoise images of arbitrary dimension, creating a unified geometric and algorithmic PDE framework, relying on (sub-)Riemannian geometry. In particular, we follow a different numerical approach, for which we prove convergence in the case of TVF by an application of Brezis–Komura gradient flow theory. Our framework also allows for additional data adaptation through the use of locally adaptive frames and coherence enhancement techniques. We apply TVF and MCF to the enhancement and denoising of elongated structures in 2D images via orientation scores and compare the results to Perona–Malik diffusion and BM3D. We also demonstrate our techniques in 3D in the denoising and enhancement of crossing fiber bundles in DW-MRI. In comparison with data-driven diffusions, we see a better preservation of bundle boundaries and angular sharpness in fiber orientation densities at crossings.


Introduction
In the last decade, many PDE-based image-analysis techniques for tracking and enhancement of curvilinear structures in images took advantage of lifting image data, typically defined on R d , to a multi-orientation distribution (e.g., an orientation score) defined on the homogeneous space M d of d-dimensional positions and orientations, see Fig. 1 and [5,8,11,14,20,53]. After lifting the image to a multiorientation distribution, the distribution is taken as an initial condition of a PDE flow. After solving a limited number of iterations of the PDE model, one obtains a regularized version of the original distribution, and by integration over all orientations, one obtains a regularized version of the original image. The key advantage of lifting the images from R d to the homogeneous space M d is that the PDE flow can act differently on substructures with different orientations [5,11,24]. For instance, if the image contains two crossing lines, the PDE can regularize the two lines independently, rather than regularizing the whole crossing. Similarly, if the image contains a corner, the corner is preserved in the regularized image.
PDE flows on orientation lifts of 3D images are relevant for applications such as fiber enhancement [17,21,46,53] and fiber tracking [45] in diffusion-weighted magnetic resonance imaging (DW-MRI), and in enhancement [33] and tracking [15] of blood vessels in 3D images. Instead of direct PDE-based processing of an image, we apply PDE-based processing on a lifted image U : R d S d−1 → R (e.g., an orientation score: OS). The OS is obtained by convolving the image with a set of rotated wavelets allowing for stable reconstruction [5,20,33]. 2nd row: vessel tracking in a 2D image via geodesic PDE flows in OS that underlay TVF: [5,11,24], with n = (cos θ, sin θ) T ∈ S 1 . 3rd row: CED-OS diffusion of a 3D image [23,33] visualized as a field of angular profiles (see Remark 1). In this article we study image enhancement and denoising via TVF and MCF on M d = R d S d−1 and compare to nonlinear diffusion methods The general workflow is illustrated in Fig. 1. The original image is described by a function f : f → R + , where f ⊂ R d is its support. From f ∈ L 2 f , one computes an orientation lift U : M d → C, compactly supported within There are various ways to construct such a lift: it can be (the real part of) an invertible orientation score [22] (cf. Fig. 1), a channel representation [28], a lift by Gabor wavelets [3], or a fiber orientation density [44]. In all of these approaches the absolute value |U (x, n)| can be regarded (after normalization) as a probability density of finding a fiber structure at position x ∈ R d with local orientation n ∈ S d−1 . We set the orientation lift U as an initial condition of a PDE flow U → t (U ) with evolution time t > 0. Finally, the processed multi-orientation representation t (U ) is integrated over all orientations to obtain the enhanced image f t . In this article, we will work with the orientation score, with the main motivation being that this operation is invertible [20], so that when taking t ↓ 0, the output equals the input, i.e., lim t↓0 f t = f in L 2 -sense. The enhanced image that one obtains after running a PDE flow, (the bottom-right picture in Fig. 1), naturally depends on the type of flow used. One flow may be more suitable than another, depending on the requirements imposed on the resulting image. In case it is important to preserve sharp tran-sitions in the image, while maintaining plateaus, nonlinear flows such as total variation flows (TVF) and mean curvature flow (MCF) [49] are typically more suited than nonlinear diffusion flows (such as Perona and Malik diffusion [42] and coherence-enhancing diffusion [54]).
For d = 2, TVF and MCF were recently generalized to lifted images by Chambolle and Pock [11] and Citti et al. [13], respectively.
Their promising results have motivated us to generalize TVF and MCF to lifted images for general dimension d and provide a general geometric and algorithmic framework that can accommodate features such as locally adaptive frames and coherence enhancement.
The benefits of our approach are that we obtain a single unifying geometric and algorithmic framework for arbitrary d, with efficient algorithms (for d = 2, 3) that preserve crossing lines, corners, plateaus, edges and bundle boundaries and can improve curvature adaptation via the optional inclusion of locally adaptive frames. Such frames account for curvature of lines and allow us to remove bias toward sampled orientations in orientation scores.
Our PDE methods on M d are computationally more expensive than their counterparts acting only on R d , but they are still practical. Similar to crossing-preserving nonlinear diffusion on SE(2) ≡ M 2 , locally adaptive frames allow us to remove orientation sampling bias in orientation scores [29,Fig. 6.11] and to use only 4 orientation samples [30]. For our crossing-preserving MCF and TVF on M 2 we sample our (processed) orientation scores only on 8 orientations. On M 3 we compute regularized orientation lifts on a grid with 162 orientations, where we rely on efficient numerical schemes for PDEs on M 3 relying on the low-order PDE discretization schemes explained in [17,37], instead of higher-order schemes via spherical harmonics [34,Ch. 3.4], in order to reduce computation time.
The structure of this article is as follows. We start by recapitulating orientation scores and explaining the homogeneous space M d as a Lie group quotient in the rigid body motion group SE(d) in Sect. 2 and explain the necessary geometric concepts. In Sect. 3, we introduce the PDEs for total variation and mean curvature flow on M d and explain our explicit discretization scheme. Our numerical scheme includes regularization for which we prove convergence in Sect. 4. In Sect. 5 we evaluate the potential of our methods with 2D and 3D experiments.

Remark 1 (Visualization of 3D orientation scores)
In the 3rd row of Fig. 1, and henceforth, we visualize a lifted image U : R 3 S 2 → R + by a grid of angular profiles { μ U (x, n) n | x ∈ Z 3 , n ∈ S 2 }, with fixed μ > 0.

Remark 2 (Additional content in this version)
This article is an extended version of the authors' SSVM article by the same name [25]. The following content is new:  [25] but not yet proven.
These objects are well defined despite the noninjective mapping used since the choice of particular R n ∈ SE(d) that maps a to n does not affect any of them -Extensions of our 2D denoising/enhancing experiments, Sect. 5,Figs. 6,7,8,9,10,11,12,13,14,15, and Table 4. These experiments now include a full comparison of isotropic and anisotropic processing, and the effect of the including coherence enhancement (via locally adaptive frames) in TVF and MCF. They also include additional comparisons to Perona-Malik [42] diffusion and to a well-established denoising method: BM3D [18,35]. -A more comprehensive treatment of the geometric tools used such as vector fields and metric tensors. We now clearly distinguish between the group SE(d) and the homogeneous space M d . We also explain how to transfer geometric tools on these sets in Sect. 2.6 and Table 2.

Preliminary Theory
Before we can provide the generalized PDEs, which include TVF and MCF as special cases, we need to construct the necessary tools.
In this section, we review orientation scores, the rigidbody motion group SE(d), and the homogeneous space M d of positions and orientations. For further reading on engineering applications and harmonic analysis on Lie group SE(d) we refer to [12,Ch. 6]. For theory on homogeneous spaces we refer to [36,Ch. 21]. For image processing on SE(2), see for example [5,8,11,14], for image processing on SE(3), see for example [39,45,47]. Building an orientation score starts with selecting an orientation-sensitive filter (or wavelet) ψ ∈ L 1 ∩ L 2 R d . We can then (under appropriate conditions [20,33]) filter out a particular direction from an image f ∈ L 2 R d by convolving the image with this filter aligned to that direction. An orientation score W ψ f can then be constructed by applying this filtering for all directions n ∈ S d−1 :

Orientation
for all x ∈ R d and rotations R n that map a reference axis a ∈ S d−1 to n. For this paper we will be using cake wavelets [20,33] for our filter ψ, illustrated in Fig. 2 for d = 2. These wavelets are directional filters that have the property that we can accurately reconstruct the original image from the orientation score (again under appropriate conditions) by integration over S d−1 , i.e., where σ denotes the usual surface measure over S d−1 . We always use standard cake wavelet parameter settings from [37] in our experiments. The explicit formulas for these cake wavelets that allow invertible orientation scores are available in [5,20] and specifically for d = 3 in [33]. An intuitive illustration of an orientation score is given in Fig. 2 is given by These roto-translations act transitively on the space We choose an a priori reference vector a ∈ S d−1 , say a = (1, 0) T if d = 2 or a = (0, 0, 1) T if d = 3. Then the stabilizer of the element (0, a) is given by which is isomorphic to SO(d − 1). The homogeneous space of positions and orientations is the partition of left cosets The left cosets are equivalence classes in SE(d) with respect to the equivalence relation where R a,α denotes a (counter-clockwise) rotation over angle α around the reference axis a. This means that two roto-translations g 1 = (x 1 , R 1 ) and g 2 = (x 2 , R 2 ) are equivalent if and only if The equivalence classes [g] = g ∈ SE(3) g ∼ g are usually just denoted by p = (x, n) as they consist of all rigid body motions g = (x, R n ) that map the reference point (0, a) onto (x, n) ∈ R 3 × S 2 : Remark 3 (Distinguishing SE(d) from M d ) As the distinction between the group SE(d) and the homogeneous space M d (which is not a group for d = 3 and above) is important, we will use g, h for elements of SE(d), and p, q for points in M d .
To understand why the situation changes from d = 2 to d > 2 observe that in 2 dimensions we need one angle to specify orientation and have one rotational degree of freedom, in 3 dimensions we need 2 angles to specify orientation, but we have 3 rotational degrees of freedom, i.e., M 3 has one dimension less than SE (3). See also Fig. 3 for an illustration of this difference.
For d > 3 this situation persists as we have more rotational degrees of freedom that do not change the orientation.

Remark 4 (Domain of an orientation score)
The orientation score is well defined on the domain M d if we assume ψ is not affected under the action of subgroup H d . For d = 3 this means we must impose axial symmetry on the wavelets, for details see [33].

Differential Structure on SE(d), M d
As a manifold, we view the group SE(d) in a standard way as a submanifold of R d × R d×d . The Lie algebra is, as a vector space, the tangent space at the unit element (see [36,Ch. 7]). We view elements of tangent spaces (i.e., tangent vectors) as derivations acting on functions: If v is an ordinary vector in R d × R d×d tangent to SE(d), the corresponding derivation acting on a function f ∈ C 1 (SE(d)) is just the derivative of f in the direction of v.
The Lie algebra has dimension D = 1 2 d(d +1). We choose a basis (A i ) D i=1 for the Lie algebra of SE(d) with the following properties. The basis is orthonormal with respect to the inner product belonging to the standard Euclidean metric on R d × R d×d ; the vectors {A 1 , . . . , A d } span the spatial part of the Lie algebra, which is isomorphic to R d with the vector A d corresponding to the derivative in the direction of a. Recall that for d = 2 the subgroup H d is trivial. For d ≥ 3 one has that the set {A 2d , . . . , A D } forms a basis for the stabilizer subgroup H d . We take the convention that the Lie For the case d = 2 this gives us two spatial generators, A 1 and A 2 , and one rotation generator A 3 . Moving to d = 3 we have 3 spatial degrees of freedom and 3 rotational degrees of freedom, but only 2 of those rotational degrees of freedom will change the orientation, we denote the generator corresponding to the rotation that preserves the reference axis by A 6 . This gives us the following basis: As is illustrated in Fig. 3 for d = 3 we have a rotational degree of freedom that does not change the orientation reference axis. As a result M 3 is not isomorphic to SE(3). It is rather a 5dimensional quotient of the 6-dimensional Lie group SE(3), see also Remark 3.
Remark 5 (Generalization for d > 3) Generalizing this scheme for d > 3 we would have the following basis for the Lie algebra: We extend the vectors A i to left-invariant vector fields A i as follows. The group acts on itself by left multiplication, and the derivation (A i ) g , evaluated in a point g, is given by the pushforward for all f ∈ C ∞ (SE(d), R). We denote the corresponding covector fields by ω i : g → ω i | g . For each g ∈ SE(d), the covector ω i | g is an element of the dual to the tangent space of SE(d) at g. The covector fields are characterized by where δ i j denotes the Kronecker delta. Note that and so for all g = (x, R n ) ∈ SE(d): from which we infer that the left-invariant frame is aligned with the direction n ∈ S d−1 .

Remark 6 (Left-Invariant Basis in 2D)
We can represent an element of SE(2) by its position and angle as (x, θ) ∈ R 2 ×[0, 2π ) which allows us to write the left-invariant vector fields A i as: For an explicit form of the left-invariant vector fields A i in case d = 3, see "Appendix A." We introduce the following metric tensor field in terms of the left-invariant co-vector fields

Definition 1 (Left-invariant metric tensor field)
Given positive constants D S > 0 and D A > 0, and a nonnegative real number e ≥ 0, we define the left-invariant metric tensor field G by Remark 7 (Sub-Riemannian case) Henceforth we refer to e = 0 as the sub-Riemannian case where tangent vectors are constrained to the span of A d , . . . , A 2d−1 . Intuitively when e ↓ 0 the other tangent directions get infinite cost and become prohibited. This means that we restrict ourselves to so-called horizontal tangent vectors: Observe that this sub-Riemannian metric tensor is defined (and invertible) on a sub-bundle of the tangent bundle on the group as it does not contain any of the covectors dual to the sub-bundle induced by subgroup H d . Furthermore it is spatially isotropic orthogonal to the primary spatial direction. Also spherically we impose isotropy in the metric as can be seen from the last term in the above definition.
This metric induces an associated norm: Ifġ ∈ T g (SE(d)), then where again in the sub-Riemannian case we only allowġ to be in the span of A d , . . . , A 2d−1 . Now that we have SE(d) equipped with a (sub)-Riemannian metric tensor, we can derive the basic tools that are required to formulate our geometric PDEs. These basic tools include the gradient, its norm, and the divergence of a vector field. Let us relabel our parameters as LetŨ : SE(d) → R carry the axial symmetry: Then in the Riemannian setting the gradient of a differentiable functionŨ : SE(d) → R on the group induced by this metric tensor becomes where the sum only runs to 2d − 1 and not to dim(SE(d)) = D = 1 2 d(d + 1) since (16) implies that The gradient then has the following norm The divergence of a vector field is given by In the sub-Riemannian setting, where we restrict ourselves to vector fields spanned by

Locally Adaptive Frames on SE(d) as SVD of the Hessian
As an alternative to the left-invariant frame we can choose a frame (and subsequently a metric tensor field) that is adapted to the data (which we also refer to as gauge frames in analogy with [23]). Specifically, instead of having the vector field A d = n · ∇ as a static forward direction we want to choose a vector field B d that locally aligns with the data [23]. In particular, B d can take on a angular component, meaning the local "straight forward" will follow the curve of the data; consequently, flows can better follow curved structures, see Fig. 4 for an example.

Remark 8 (Fitting a frame)
We can induce an entire frame in SE(d) from a choice of main vector, see [23, Appendix A] for details. For an intuitive illustration see Fig. 4. In this article we will focus on the method by which the main gauge vector is obtained.
Next we will present a singular value decomposition of the Hessian; we will choose the eigenvector associated with the smallest eigenvalue as B d . Geometrically, this can be seen as the direction in which the gradient changes the least. Before we can formulate this procedure we explain the concept of exponential curves (see Fig. 5).
Definition 2 (Exponential curve) Letġ ∈ T g (SE(d)) then the exponential curve parameterized by t through g with tangent vectorġ is written as eġ t g and is the curve for which eġ 0 g = g and which has the property that for all t ∈ R: (11), the red line indicates the exponential curve associated with A 2 . Bottom: we choose a frame that takes into account the local curvature; here the green line indicates the exponential curve associated with B 2 Or more explicitly in coordinates, ifġ = 2d−1 i=1ġ i A i g we have that: Hence the exponential curves are those curves whose tangent vector components with respect to the left-invariant frame do not change. For an illustration of such curves for the case d = 2 see Fig. 5.
In view of (16) and (18) we define Now we want to select B d g (normalized with respect to the existing metric tensor G g ) so that the gradient of the dataŨ ∈ C 1 (SE(d), R) changes as little as possible (recall Fig. 4) in the following manner.

Definition 3 (Main gauge vector)
We define the main gauge vector as 5 A set of exponential curves through a common point for d = 2. Exponential curves are those curves whose tangent vectors are part of a left-invariant vector field. Taken with permission from [5] = argminġ ∈T (g) where we assume thatŨ is such that we have a unique minimizer. The Hessian in the previous equation is induced by a Cartan connection as outlined in [23,Appendix 4,(133)].
Writing the tangent vector in terms of the local leftinvariant frame asġ = 2d−1 i=1ġ i A i g ∈ T (g) let us write out the Hessian as follows: We can write this problem in terms of matrices by defining the following: with i as row index and j as column index.
Using these the objective function in (22) becomes which we want to minimize under the constraint Taking the derivative of the Lagrangian of this convex optimization problem gives us optimality under the following condition (λ ∈ R): i.e.,ġ needs to be an eigenvector of the matrix M 2 K T M 2 K with eigenvalue λ (serving as the Lagrangian multiplier). If for a moment we rewrite (25) as With this eigenvalue and vector objective function (23) evaluates tȯ This last equation incidentally proves that M 2 K T M 2 K is positive semidefinite and, more importantly, that to minimize the change in gradient we need to look at the eigenvector belonging to the smallest eigenvalue.
In practice, we do not immediately calculate the eigenvectors and eigenvalues from the scheme we have just proposed, but for the purpose of stability we first apply a componentwise Gaussian smoothing on the matrix K as follows: ergen with the usual surface measure σ on S d−1 and with the smoothing kernel where G M ρ is the heat kernel on the Riemannian manifold M with timescale ρ > 0, the spatial kernel is centered on 0, and the orientation kernel is centered on the reference direction a.
Remark 9 (Diffusion on M d ) It is important in the context of M d to choose diffusion that is isotropic spatially (with timescale ρ s ) and spherically (with timescale ρ a ) since this is the only diffusion that commutes with the left-invariant vector fields. Note that G R d ρ s (0, y) depends only on y and G S d−1 ρ a (a, m) depends only on arccos (a · m) making G( y, m) the heat kernel on the product manifold R d × S d−1 . This smoothing method is a variant on the one used in [23].
The remaining basis vectors are determined by considering a rotation that maps A d g to B d g and then applying a specific rotation to the remaining A i g that keeps the remaining spatial generators spatial. For an illustration see Fig. 4. How this rotation is chosen and applied is detailed in [23, Having determined a data-adaptive frame Fig. 4), we equip it with the following straightforward metric, where again we rely on the corresponding dual frame β i 2d−1 i=1 given by Definition 4 (Gauge metric tensor field) We define the gauge metric tensor field g → J g (·, ·) as which induces a norm onġ ∈ T g (SE(d)): a gradient onŨ ∈ C 1 (SE(d)): with norm and finally gives the divergence of a vector field as: which means that if we apply it to a vector field expressed in the gauge frame as

Coherence Enhancement Operator
Coherence-enhancing diffusion is a well-known technique for image enhancement [54]. It is intended for line amplification rather than strictly denoising. Crossing-preserving versions on M d have been developed [23] and evaluated for denoising. Here, crossing lines are well enhanced, but plateaus and boundaries of line structures are damaged. Therefore we propose to include the coherence enhancement technique into TV and MC flows. Next we explain how this coherence enhancement operator is constructed from an orientation confidence.
In R 3 , orientation confidence is calculated by the Laplacian in the subspace orthogonal to the line structure. We can take a similar approach in M d by taking the Laplacian in the space spanned by Recall that A d is implicitly aligned with the local line structure along n. In the gaugeframe setting B d is explicitly aligned with the line structure (see Fig. 4), and therefore we take the Laplacian in the span of (B i ) 2d−1 i=1,i =d . In the sub-Riemannian case (i.e., D 1 = · · · = D d−1 = 0) this just reduces to the second derivatives in the (d − 1)dimensional spaces spanned by (A i ) 2d−1 i=d+1 and (B i ) 2d−1 i=d+1 , respectively. With that in mind we define orientation confidence in SE(d) as follows.
LetŨ : SE(d) → R, then in the left-invariant case we define or in the gauge-frame case. Note that the B i 's are normalized with respect to old metric (12) and as such the parameters D i are still included in (36). (35) and (36) coincide.
Definition 5 (Isotropy factor) Let c > 0 be a chosen scaling constant, then the isotropy factor is defined as: with CŨ defined by (35), respectively (36).
What is convenient about this quantity is that it gives a number in the range (0, 1] with a number close to zero indicating a high degree of anisotropy and a 1 indicating perfect isotropy. This is the quantity that we can use to steer flow. The choice of c controls how steep the decline of the isotropy factor is. Its appropriate value depends on the application and on exactly how the data are represented numerically (normalized to [0, 1] in our case) and are best determined heuristically or by histogram. For our experiments we have used c = 0.2.
Using this scalar function αŨ on the group SE(d), we can locally modify vectors based on how certain we are the data are locally aligned. We refer to this modification of vector fields as coherence enhancement (as in coherence enhancing diffusion [30]). Tangent vectors (such as the gradient as we will see) are modified as follows. Let v be a vector field on SE(d). Then the coherence-enhanced vector field is given as for the left-invariant geometry and as and for the gauge geometry. Intuitively, these linear operators E G , E J : T (G) → T (G) preserve the magnitude of the vector in the main direction and weaken it orthogonal to the main direction if we are certain the data are locally aligned to the main direction.

Descending to the Homogeneous Space
So far we have developed two distinct geometries on the group SE(d) that are summarized in Table 1. We can bring these geometries down to the homogeneous space M d by considering the natural extension of functions and vector fields on M d to SE(d). Consider a function U ∈ C ∞ (M d ), then the functionŨ , given bỹ for all g = (x, R) ∈ SE(d), is its natural extension to SE(d) and is clearly also smooth. Similarly, a tangent vector field (recall that we understand these as differential operators acting on scalar functions) v on M d can be extended as follows: under the additional constraint thatṽ vanishes in the direction induced by the subgroup H d (i.e., for all i ≥ 2d we have → i ,ṽ = 0) this extension is unique.
Having extended functions and vector fields upward to the group, we can apply the tools from Table 1 to them and subsequently project the results back to the homogeneous space by the mapping (x, R) → (x, Ra). This mapping is not injective. Nevertheless, thanks to metrics (12), (30) being laterally and spherically isotropic and the way we extend functions to the group by (39), (40), all the tools we list in Table 2 are well defined on M d .
Remark 10 (Choice of R n ) While the choice of mapping n → R n does not matter for the final result, a choice does have to be made for an implementation when d ≥ 3. The most straightforward manner is selecting that R n which is an in-plane rotation, meaning the plane of rotation is spanned by a and n. In the two cases where this is not possible (i.e., n = ±a) we pick R a = 0 and R −a = R e z ,π , where R e y ,π denotes the rotation around the axis e y by an angle π . Concretely the in-plane rotation in 3D is given in terms of the ZYZ-Euler angles α, β, γ by requiring that α = −γ , which gives the mapping for the unique α ∈ [0, 2π) and β ∈ (0, π) so that the resulting rotation maps a to n.

PDE System
On R n the formulation of total variation is built on the identity div From the last equation we deduce the following integration by parts formula: for all U ∈ C 1 ( ) and all smooth vector fields v vanishing at the boundary ∂ . This formula allows us to build a weak formulation of TVF on M d starting from functions of bounded variation (BV) [1]. (1)). Let χ 0 ( ) denote the vector space of smooth vector fields that vanish at the boundary ∂ and let ε ≥ 0. Then we define If TV 0 (U ) < ∞ we say that U ∈ BV ( ).
For all U ∈ BV ( ) we have Recall Remark 7 about the sub-Riemannian setting, and recall the notion of horizontal tangent vectors (13). So (44) also covers the sub-Riemannian setting (i.e., e = 0) when setting Furthermore for U ∈ C 2 (M d , R) and e, ε > 0 we have that Proof First we substitute (43) into (44), then we apply Gauss theorem and use U v| ∂ = 0. Then we apply Cauchy-Schwarz on V p := R × T p M d for each p ∈ M d , with inner product which holds with equality iff the vectors are linearly dependent. Therefore we smoothly approximate by (ψ, v) one obtains (46).
, and the result follows.
For vector fields v on M d define the regularized norm: This is a common way to regularize denominators, and we will use Sect. 4 to prove that this approach converges for ε → 0. Now we propose the following roto-translation equivariant enhancement PDE on ⊂ M d , recall (1).

Definition 7 (Equivariant enhancement PDE) Given
the gradient flow started at U with evolution time t ≥ 0 and parameters a, b ∈ {0, 1}. Here we use Neumann boundary conditions with N(x) as the normal to the spatial boundary at x ∈ f . The coherence enhancement version of this PDE is given by replacing div by div • E (recall (37) and (38)): Tables 1 and 2.

Remark 11 (Two versions of the PDE) This PDE system on the quotient M d has two versions depending on whether one chooses the left invariant or gauge geometry as outlined in
We then have the following cases: -For (a, b) = (1, 1) we arrive at mean curvature flow (MCF), studied for d = 2 in [13]. -For (a, b) = (0, 1) we arrive at total variation flow, studied for d = 2 in [11]. -For (a, b) = (0, 0) we arrive at a linear diffusion for which exact smooth solutions exist for both d = 2 and d = 3 [43].

Remark 13 (Lack of regularity and weak solutions)
For MCF and TVF smooth solutions to PDE (49) exist only under special circumstances. This lack of regularity is an advantage in image processing to preserve step edges and plateaus in images, yet it forces us to define a concept of weak solutions.
Here, we distinguish between MCF and TVF. For MCF one relies on viscosity solution theory developed by Evans-Spruck [26], see also [32,50] for the case of MCF with Neumann boundary conditions. In [13, Thm 3.6] existence of C 1 -viscosity solutions is shown for d = 2.

Remark 14
In this article we do not address convergence of our PDE solutions toward the sub-Riemannian setting e ↓ 0, and we only focus on convergence results for ε ↓ 0. In previous work (by others) convergence to the sub-Riemannian setting is addressed for special cases. For the special case (a, b) = (0, 0) convergence of the solutions with respect to e ↓ 0 is clear from the exact solutions see [43, ch:2.7]. For such convergence in the challenging case (a, b) = (1, 1) (MCF), see Citti et al. [4,13]. For Eikonal PDEs convergence of viscosity solutions toward the sub-Riemannian setting holds as well, see [24,Thm.2]. It is therefore interesting to see whether convergence results toward the sub-Riemannian setting hold for the general case including the TVF case, but this falls outside the scope of this article. In Sect. 4 we only focus on convergence results for ε ↓ 0 for e > 0 fixed.

Numerics
We implemented PDE system (49) by Euler forward time discretization, relying on standard B-spline or linear interpolation techniques for derivatives in the underlying tools on M d given in Table 2. For details see [17,30]. Also, the explicit upperbounds for stable choices of stepsizes can be derived by the Gershgorin circle theorem [17,30].
For d = 2 the discretization is straightforward [30], for d = 3 we discretized per [17] in the software package Lie Analysis for Mathematica developed by Martin et al. [37], to our PDEs of interest (49) on M 3 . The Euler-forward discretizations are not unconditionally stable. For a = b = 0, the Gershgorin circle theorem [17, ch.4.2] gives the stability bound when using linear interpolation with spatial stepsize h and angular stepsize h a . In our experiments, for d = 2 we set h = 1 and for d = 3 we took h a = π 25 using an almost uniform spherical sampling from a tessellated icosahedron with N A = 162 points. TVF required smaller times steps when ε decreases. Keeping in mind (50) but then applying product rule (42) to the case 0 < ε 1, we concentrate on the first term as it is of order ε −1 when the gradient vanishes. Then we find t ≤ ε · ( t) crit for TVF. For MCF we do not have this limitation.
While both the quantitative and qualitative results of our proposed methods are encouraging we have to end our numerics section mentioning the computational cost. In Table 3 we summarize the relative computational time of our methods versus spatial Perona-Malik, and this summary shows our methods being several orders of magnitude slower. In the interest of fairness we need to add that benchmarking our prototype Mathematica implementation against the builtin Perona-Malik implementation is not a fair comparison, we are confident that an optimized native implementation would fare much better in a performance comparison.

Gradient Flow and Convergence
In this section we provide a gradient flow formulation that we will use to prove the convergence of our regularization scheme for TVF. The reader who is more interested in the experimental results than the technical convergence results can safely choose to skip this section and continue reading Sect. 5.

Preliminaries
The total variation flow can be seen as a gradient flow of a lower-semicontinuous, convex functional in a Hilbert space, as we explain next.
If F : H → [0, ∞] is a proper (i.e., not everywhere equal to infinity), lower semicontinuous, convex functional on a Hilbert space H (not to be confused with the subgroup H above, as we will not need the subgroup anymore we will stick with convention and use H for the Hilbert space from now on), the subdifferential of F in a point u in the finiteness domain of F is defined as The subdifferential is closed and convex, and thereby it has an element of minimal norm, called "the gradient of F in u" denoted by gradF(u). Let u 0 be in the closure of the finiteness domain of F. By Brezis-Komura theory, [9], [
is convex. In that case, the functional is convex as well, for arbitrary v ∈ H , because the latter functional deviates from the first by an affine functional. We first prove a stability estimate for the minimization of 1/τ -convex functionals. Lemma 2 Let τ > 0. If a functional : H → (−∞, ∞] on H is 1/τ -convex, and u * is its unique minimizer, then for all u ∈ H, This lemma is an extension of a standard result regarding strongly convex functionals (see, e.g., [41,Thm. 2.1.7]) but with no assumptions on differentiability. We include the proof in "Appendix B." For a proper (i.e., not everywhere equal to ∞), lower semicontinuous, convex functional F, and τ > 0, define the so-called [48] proximal operator J F τ : H → H by We provide the proof in "Appendix C." The idea is that the stability estimate in Lemma 2 will allow us to conclude that J F τ [u 0 ] and J G τ [v 0 ] are close when u 0 and v 0 are close. By iterating the operators J F τ and J G τ , we approximate the gradient flows of F and G, respectively, and from slope estimate (51) we will derive that this approximation is uniform. This will allow us to derive bounds for the gradient flows from the bounds for J F τ and J G τ . We now know that the gradient flows of F and G are close when the slopes |∂ F|(u 0 ) and |∂G|(v 0 ) are bounded. This assumption can be rather stringent. We will relax it and merely require that F(u 0 ) and G(v 0 ) are bounded by some constant E > 0, in exchange for a bound between gradient flows that is slightly worse. Our approach will be to run the gradient flow for a small time s from u 0 and v 0 and use the regularizing property of the gradient flow to conclude a slope bound. On the other hand, if s is small, u(s) and v(s) will be close to u 0 and v 0 . We will then choose s (almost) optimally to derive a bound between the gradient flows.

Strong L 2 -Convergence of TV Flows
We prove the convergence, stability, and accuracy of TV flows by considering them as the gradient flows of the family of functionals TV ε . The theory of contraction semigroups [2,Ch. 4] will allow us to show that as ε → 0 the gradient flow of TV ε converges to the gradient flow of TV 0 in the L 2 sense.
Just like Lemma 1 and Proposition 1 the following theorem was already announced in [25] but similarly lacked a proof, which we now include.
Proof By the evolution variational inequality [2, Theorem 4.0.4, (iii)], we know that for all s > 0 By the regularizing property [2, Theorem 4.0.4, (ii)], and where u * minimizes F and v * minimizes G. Because the gradient flow is a nonexpansive semigroup [2, Theorem 4.0.4, (iv)], we obtain Now assume t < E 6 M 6 /δ 9 . We will want to choose s (almost) optimally, depending on t. We choose and note that with L := M/s, we have By the slope estimates (54) we can apply Proposition 1 to the gradient flows starting at u(s) and v(s), to obtain If, for the general result of Theorem 1, we take F = TV 0 , G = TV ε and δ = ε | | we obtain the following result.

Experiments
In our experiments, we aim to enhance contour and fiber trajectories in medical images and to remove noise. Lifting the image f : R d → R toward its orientation lift U : M d → R defined on the space of positions and orientations M = R d S d−1 preserves crossings [30] and avoids leakage of wavefronts [24].
For our experiments for d = 3 the initial condition U : M 3 → R + is a fiber orientation density function (FODF) obtained from DW-MRI data [44].
For our experiments for d = 2 the initial condition U is an invertible orientation score (OS) that we sampled on 8 equidistant orientations.
Finally, we include denoising experiments where we show qualitative and quantitative results where comparison with Fig. 9 Qualitative comparison of over-smoothed collagen images starting from a noisy image with σ = .2 Gaussian noise. Images are smoothed for twice the time needed to reach their minimal L 2 error per Fig. 8. We use isotropic processing with coherence enhancement the well-known denoising technique BM3D [18] shows advantages and good results.

Image Enhancement/Denoising
In accordance with the workflow in Fig. 1 we go through the following steps: for t ≥ 0. With respect to the final step we recall that we use cake wavelets that allow for sharp approximate reconstruction by integration over angles only. Here U → W (·, t) = t (U ) denotes the flow operator on M 2 (49). Hence the initial condition for our TVF/MCF-PDE (49) is set by an orientation score of image f : R 2 → R given by (2).
By the invertibility of the orientation score one has f = f a 0 , so all flows depart from the original image. We refer to the different methods we experimented with, by the following terms.
-Left invariant: we use the left-invariant geometry per the first column of Table 1. -Gauge: we use the locally adaptive-frames geometry per the second column of Table 1. -Isotropic: we set e = 1. So that Isotropic Gauge TVF with coherence enhancement for example would equate to setting (a, b) = (0, 1), e = 1, includes the E operator in the PDE and uses the second column of Table 1 to define our geometric objects. For quantitative comparison we will look at relative L 1 and L 2 errors, meaning if we have a (clean) source image f source and a denoised image f t that has been processed to time t we calculate the relative error as: with the corresponding L 1 or L 2 norm. We will test against two types of noise: Gaussian and correlated.

Gaussian Noise
We apply Gaussian noise with standard deviation 0.2 to our normalized (to [0, 1]) source image; the original and noisy images are shown in Fig. 9a, respectively, Fig. 9b.
In Fig. 6 we show how the errors progress with t ≥ 0 for the isotropic (e = 1) case without coherence enhancement (i.e., without E). For comparison we plot the same error with spatial Perona-Malik. While Perona-Malik is clearly more stable and resilient to over-smoothing, both our proposed methods have much smaller minimal errors.

Remark 16 (Interpretation of timescales)
The different methods work on different timescales, we scale these to be able to plot the results together, but no meaning should be attributed to one method obtaining a minimum earlier than another. The error graphs just show: -how large the minimal error is and -how fast the image deteriorates after this minimum has been reached.
In our next experiment we increase anisotropy by setting e = .25, the resulting errors are plotted in Fig. 7. We gain no improvement in minimal error while requiring more computational cycles to reach the minimum, from which we conclude that for this application isotropic processing is more desirable.
In Fig. 8 we show the errors for the isotropic setup with coherence enhancement included (for c = .2). We get a very minor improvement in minimal errors and a decent improvement in over-smoothing stability, although still not on the level of Perona-Malik. It is remarkable that with coherence enhancement included the data-adaptive geometry is less stable than the left-invariant geometry, we observe that combining two different methods of adapting to the data is counter-productive in this instance.
For a qualitative comparison of the different isotropic methods with coherence enhancement we over-smooth the collagen image past the time of its lowest L 2 error with a factor of two, the corresponding qualitative results are shown in Fig. 9.

Correlated Noise
For correlated noise we apply a Gaussian filter with σ = 1.0 to Gaussian noise with σ = 0.2. The error evolution for the isotropic methods is plotted in Fig. 10. We observe that MCF performs worse in this setting against correlated noise; both in minimal error and in stability it does not do as well as spatial Perona-Malik. TVF on the other hand has a better minimal error than Perona-Malik at the cost of stability. The  Fig. 11. We use isotropic processing with coherence enhancement Fig. 13 Comparison of left-invariant TVF with coherence enhancement against BM3D. The BM3D image is denoised at σ = 0.12 (1.5× optimal L 2 ) and the left-invariant TVF denoised for 1.5× optimal L 2 time. Pay particular attention to the diagonal edges where TVF exhibits superior performance  Table 4 stability does somewhat improve if we turn on the locally adaptive frames.
The error evolution of the experiment including the use of coherence enhancement is displayed in Fig. 11. Overall this improves the results, but MCF still exhibits an inferior performance in this setting. TVF sees both an improvement in minimal error and stability. As with the Gaussian noise experiment we see that turning on both coherence enhancement and locally adaptive frames is counterproductive.
A qualitative comparison of the methods against correlated noise is shown in Fig. 12, where again we smooth twice the optimal time. We observe the same general trend as in Fig. 9; all methods do a good job of preserving edges, but TVF stands out in clearing the plateaus.
As a final method to compare against we look at BM3D [18]. Both BM3D and our methods share a dependence on some prior knowledge for optimal performance: BM3D Bold values indicate the best result per image/noise combination Three images were tested, the first being the collagen image we have been using, second a spiral test image and third a grayscale Mona Lisa. Note that all these measurements depend on some ground-truth knowledge, knowing the standard deviation of the noise in the case of BM3D and knowing the optimal processing time in case of the others requires us to know the standard deviation of the noise, and our method requires us to know the optimal processing time.
We make a qualitative comparison of removing correlated noise between BM3D and left-invariant TVF by smoothing 1.5× past the optimal L 2 error: 1.5× the optimal standard deviation in case of BM3D and 1.5× the processing time in case of TVF. The resulting images are shown in Fig. 13.
For a broader comparison, we compute peak signal-tonoise ratios for the collagen image we already saw and for two additional images of different styles shown in Fig. 14. Results are summarized in Table 4.
As the gauge TVF method compares favorably against correlated noise according to the PSNR value in Table 4, we will look at its qualitative result and compare it against BM3D in Fig. 15.

Denoising and Fiber Enhancement on FODFs in DW-MRI
In DW-MRI image processing one obtains a field of angular diffusivity profiles (orientation density function) of water molecules. A high diffusivity in particular orientation correlates to biological fibers structure, in brain white matter, along that same direction. Crossing-preserving enhancement of FODF fields U : M 3 → R + helps to better identify struc-tural pathways in brain white matter, which is relevant for surgery planning, see for example [38,44]. For a quantitative comparison we applied TVF, MCF, and PM diffusion [17] to denoise a popular synthetic FODF U : M 3 → R + from the Fiberfox Tractometer challenge with realistic noise profiles [40]. In Fig. 16, we can observe the many crossing fibers in the dataset. Furthermore, we depicted the absolute L 2 -error t → U − t (U ) L 2 (M 3 ) as a function of the evolution parameter t, where t (U ) = W ε (·, t) with optimized ε = 0.02 in the case of TVF (in green), and MCF (in blue), and where t is the PM diffusion evolution [17] on M 3 with optimized PM parameter K = 0.2 (in red). We also depict results for K = 0.1, 0.4 (with the dashed lines) and ε = 0.01, 0.04. We see that the other parameter settings provide on average worse results, justifying our optimized parameter settings. We set D S = 1.0, D A = 0.001, t = 0.01. We observe that: -TVF can reach lower error values than MC flow with adequate t = 0.01, -MCF provides more stable errors for all t > 0 than TV flow with respect to > 0, -TVF and MCF produce lower error values than PM diffusion, -PM diffusion provides the most variable results for all t > 0.
For a qualitative comparison we applied TVF, MCF, PM diffusion and linear diffusion to a FODF field U : M 3 → R + obtained from a standard DW-MRI dataset (with b = 1000 s/mm 2 , 54 gradient directions) via constrained spherical deconvolution (CSD) [19,52]. See Fig. 17, where for each method, we used the optimal parameter settings with the artificial dataset. We see that -all methods perform well on the real datasets. Contextual alignment of the angular profiles better reflects the anatomical fiber bundles, -MCF and TVF better preserve boundaries and angular sharpness, -MCF better preserves the amplitude at crossings at longer times.

Conclusion
We have proposed a PDE system on the homogeneous space M d = R d S d−1 of positions and orientations, for crossingpreserving denoising and enhancement of (lifted) images containing both complex-elongated structures and plateaus. It includes TVF, MCF and diffusion flows as special cases and includes (sub-)Riemannian geometry. Thereby we generalized recent related works by Citti et al. [13] and Chambolle

Fig. 15
Comparing gauge TVF with coherence enhancement and BM3D against correlated noise, the standard deviation for BM3D and the evolution time for TVF were set at 1.5 times the number needed to reach the optimal L 2 error Fig. 16 Quantitative comparison of denoising a fiber orientation density function (FODF) obtained by (CSD) [19,52] from the benchmark DW-MRI dataset Fiberfox [40] and Pock [11] from 2D to 3D using a different numerical scheme with new convergence results (Theorem 1) and stability bounds. We used the divergence and intrinsic gradient on a (sub)-Riemannian manifold above M d for a formal weak formulation of total variation flows, which simplifies if the lifted images are differentiable (Lemma 1).
For 2D image denoising and enhancement we have shown that in all cases TVF on M 2 has a better minimal error than Perona-Malik and MCF at the cost of being more sensitive to oversmoothing, recall Figs. 6,7,8,9,10,11, and 12. The L 1 , L 2 and PSNR measures indicate the potential of our proposed methods for denoising, and we manage to improve PSNR results against methods such as BM3D against correlated noise on some images, recall Figs. 13 and 15. Qualitatively this is mainly reflected in better clearing of plateaus while still preserving hard edges and crossings.
In 3D we compared to previous nonlinear crossingpreserving diffusion methods on M 3 ; we showed improve- Fig. 17 Qualitative comparison of denoising a FODF obtained by (CSD) [19,52] from a standard DW-MRI dataset (with b = 1000 s/mm 2 and 54 gradient directions). For the CSD we used up to 8th order spherical harmonics, and the FODF is then spherically sampled on a tessellation of the icosahedron with 162 orientations ments over Perona-Malik and improvements over contextual fiber enhancement methods in DW-MRI processing [17,21] on real medical image data. We observe that crossings and boundaries (of bundles and plateaus) are better preserved over time. We support this quantitatively by a denoising experiment on a benchmark DW-MRI dataset, where MCF performs better than TVF and both perform better than Perona-Malik diffusions, in view of error reduction and stability.
Altogether, we conclude that our TVF and MCF methods on M d work well for denoising and enhancement for both d = 2 and d = 3. In general we see clear benefits of the inclusion of locally adaptive frames and of limited inclusion of coherence enhancement. The code from our experiments is available as a Mathematica notebook at https://bmnsmets. com/files/tvf_mcf_denoising_jmiv.nb.
Future work While we have shown the potential of our PDE system on M d as a denoising/enhancement method some challenges remain for future work: -Determining stopping time, our methods show good minimal errors but are prone to degrading the image if left running for too long. For general applications a robust automatic stopping method would be helpful. Spectral analysis of nonlinear operators [10,16] may apply here. -Coherence enhancement [54] was not originally conceived for denoising. It is therefore interesting to see how edge enhancing diffusion [27] (EED) performs when generalized to M d , i.e., we would reformulate our enhancement operator E as: 2κ 2 dμ(c), and test its performance. -In this article we obtained convergence results of our PDE solutions for ↓ 0 while keeping e > 0 fixed. It is interesting to study the full limiting case ( , e) → (0, 0), for the general setting covering total variation flow.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecomm ons.org/licenses/by/4.0/.
is 1/τ -convex, it follows by Lemma 2 that Now we use that J F τ is nonexpansive [2, Eq. (4.0.2)], so We conclude that By iterating this estimate, we derive The a priori estimate [2, Theorem 4.0.4, (v)] yields that the gradient flows u and v of F and G, respectively, are approximated well by (J F t/n ) n [u 0 ] and (J G t/n ) n [v 0 ]. More precisely, for t > 0 and n > 0, the a priori estimate gives By these a priori estimates and the estimate for discrete flows (61), we see that To derive the final estimates, we need to make good choices for n. If 0 ≤ t ≤ δ/L 2 , we take n = 1 and obtain If t > δ/L 2 , we choose n = L 2/3 (t/δ) 1/3 , which is larger than or equal to 2. In that case, n/2 ≤ n − 1 < L 2/3 (t/δ) 1/3 ≤ n.
We then obtain