Analysis of (sub-)Riemannian PDE-G-CNNs

Group equivariant convolutional neural networks (G-CNNs) have been successfully applied in geometric deep learning. Typically, G-CNNs have the advantage over CNNs that they do not waste network capacity on training symmetries that should have been hard-coded in the network. The recently introduced framework of PDE-based G-CNNs (PDE-G-CNNs) generalizes G-CNNs. PDE-G-CNNs have the core advantages that they simultaneously (1) reduce network complexity, (2) increase classification performance, and (3) provide geometric interpretability. Their implementations primarily consist of linear and morphological convolutions with kernels. In this paper, we show that the previously suggested approximative morphological kernels do not always accurately approximate the exact kernels accurately. More specifically, depending on the spatial anisotropy of the Riemannian metric, we argue that one must resort to sub-Riemannian approximations. We solve this problem by providing a new approximative kernel that works regardless of the anisotropy. We provide new theorems with better error estimates of the approximative kernels, and prove that they all carry the same reflectional symmetries as the exact ones. We test the effectiveness of multiple approximative kernels within the PDE-G-CNN framework on two datasets, and observe an improvement with the new approximative kernels. We report that the PDE-G-CNNs again allow for a considerable reduction of network complexity while having comparable or better performance than G-CNNs and CNNs on the two datasets. Moreover, PDE-G-CNNs have the advantage of better geometric interpretability over G-CNNs, as the morphological kernels are related to association fields from neurogeometry.


Introduction
Many classification, segmentation, and tracking tasks in computer vision and digital image processing require some form of "symmetry".Think, for example, of image classification.If one rotates, reflects, or translates an image the classification stays the same.We say that an ideal image classification is invariant under these symmetries.
A slightly different situation is image segmentation.In this case, if the input image is in some way changed the output should change accordingly.Therefore, an ideal image segmentation is equivariant with respect to these symmetries.
Many computer vision and image processing problems are currently being tackled with neural networks (NNs).It is desirable to design neural networks in such a way that they respect the symmetries of the problem, i.e. make them invariant or equivariant.Think for example of a neural network that detects cancer cells.It would be disastrous if, by for example slightly translating an image, the neural network would give totally different diagnosis, even though the input is essentially the same.
One way to make the networks equivariant or invariant is to simply train them on more data.One could take the training dataset and augment it with translated, rotated and reflected versions of the original images.This approach however is undesirable: invariance or equivariance is still not guaranteed and the training takes longer.It would be better if the networks are inherently invariant or equivariant by design.This avoids a waste of network-capacity, guarantees invariance or equivariance, and increases performances, see for example [1].
More specifically, many computer vision and image processing problems are tackled with convolutional neural networks (CNNs) [2][3][4].Convolution neural networks have the property that they inherently respect, to some degree, translation symmetries.CNNs do not however take into account rotational or reflection symmetries.Cohen and Welling introduced group equivariant convolutional neural networks (G-CNNs) in [5] and designed a classification network that is inherently invariant under 90 degree rotations, integer translations and vertical/horizontal reflections.Much work is being done on invariant/equivariant networks that exploit inherent symmetries, a non-exhaustive list is [1,.The idea of including geometric priors, such as symmetries, into the design of neural networks is called 'Geometric Deep Learning' in [27].
In [28] partial differential equation (PDE) based G-CNNs are presented, aptly called PDE-G-CNNs.In fact, G-CNNs are shown to be a special case of PDE-G-CNNs (if one restricts the PDE-G-CNNs only to convection, using many transport vectors [28,Sec.6]).With PDE-G-CNNs the usual non-linearities that are present in current networks, such as the ReLU activation function and max-pooling, are replaced by solvers for specifically chosen non-linear evolution PDEs. Figure 1 illustrates the difference between a traditional CNN layer and a PDE-G-CNN layer.
The PDEs that are used in PDE-G-CNNs are not chosen arbitrarily: they come directly from the world of geometric image analysis, and thus their effects are geometrically interpretable.This makes PDE-G-CNNs more geometrically meaningful and interpretable than traditional CNNs.Specifically, the PDEs considered are diffusion, convection, dilation and erosion.These 4 PDEs correspond to the common notions of smoothing, shifting, max pooling, and min pooling.They are solved by linear convolutions, resamplings, and so-called morphological convolutions.Figure 2 illustrates the basic building block of a PDE-G-CNN.
One shared property of G-CNNs and PDE-G-CNNs is that the input data usually needs to be lifted to a higher dimensional space.Take, for example, the case of image segmentation with a convolution neural network where we model/idealize the images as real-valued function on R 2 .If we keep the data as functions on R 2 and want the convolutions within the network to be equivariant, then the only possible ones that are allowed are with isotropic kernels, [29, p.258].This type of short-coming generalizes to other symmetry groups as well [12,Thm.1].One can imagine that this is a constraint too restrictive to work with, and that is why we lift the image data.
Within the PDE-G-CNN framework the input images are considered real-valued functions on R d , the desired symmetries are represented by the Lie group of roto-translations SE(d), and the data is lifted to the homogeneous space of d dimensional positions and orientations M d .It is on this higher dimensional space on which the evolution PDEs are defined, and the effects of diffusion, dilation, and erosion are completely determined by the Riemannian metric tensor field G that is chosen on M d .If this Riemannian metric tensor field G is left-invariant, the overall processing is equivariant, this follows by combining techniques in [30,Thm. 21,Chpt. 4], [31,Lem. 3,Thm. 4].
The Riemannian metric tensor field G we will use in this article is left-invariant and determined by three nonnegative parameters: w 1 , w 2 , and w 3 .The definition can be found in the preliminaries, Section 2 Equation (4).It is exactly these three parameters that during the training of a PDE-G-CNN are optimized.Intuitively, the parameters correspondingly regulate the cost of main spatial, lateral spatial, and angular motion.An important Fig. 1: The difference between a traditional CNN layer and a PDE-G-CNN layer.In contrast to traditional CNNs, the layers in a PDE-G-CNN do not depend on ad-hoc non-linearities like ReLU's, and are instead implemented as solvers of (non)linear PDEs.What the PDE evolution block consists of can be seen in Figure 2.

Convection
Dilation Erosion Diffusion Fig. 2: Overview of a PDE evolution block.Convection is solved by resampling, diffusion is solved by a linear group convolution with a certain kernel [28,Sec.5.2], and dilation and erosion are solved by morphological group convolutions (3) with a morphological kernel (1).quantity in the analysis of this paper is the spatial anisotropy ζ := w1 w2 , as will become clear later.In this article we only consider the 2 dimensional case, i.e. d = 2.In this case, the elements of both M 2 and SE(2) can be represented by three real numbers: (x, y, θ) ∈ R 2 × [0, 2π).In the case of M 2 the x and y represent a position and θ represents an orientation.Throughout the article we take p 0 := (0, 0, 0) ∈ M 2 as our reference point in M 2 .In the case of SE(2) we have that x and y represent a translation and θ a rotation.
As already stated, within the PDE-G-CNN framework images are lifted to the higher dimensional space of positions and orientations M d .There are a multitude of ways of achieving this, but there is one very natural way to do it: the orientation score transform [30,[32][33][34].In this transform we pick a point (x, y) ∈ R 2 in an image and determine how good a certain orientation θ ∈ [0, 2π) fits the chosen point.In Figure 3 an example of an orientation score is given.We refer to [34, Sec.2.1] for a summary of how an orientation score transform works.
Inspiration for using orientation scores comes from biology.The Nobel laureates Hubel and Wiesel found that many cells in the visual cortex of cats have a preferred orientation [35,36].Moreover, a neuron that fires for a specific orientation excites neighboring neurons that have an "aligned" orientation.Petitot and Citti-Sarti proposed a model [37,38] for the distribution of the orientation preference and this excitation of neighbors based on sub-Riemannian geometry on M 2 .They relate the phenomenon of preference of aligned orientations to the concept of association fields [39], which model how a specific local orientation places expectations on surrounding orientations in human vision.Figure 4 provides an impression of such an association field.
As shown in [42,Fig.17]association fields are closely approximated by (projected) sub-Riemannian geodesics in M 2 for which optimal synthesis has been obtained by Sachkov and Moiseev [43,44].Furthermore, in [45] it is shown that the Riemannian geodesics in M 2 converge to the sub-Riemannian geodesics by increasing the spatial anisotropy ζ of the metric.This shows that in practice one can approximate the sub-Riemannian model by Riemannian models.Figure 5 shows the relation between association fields and sub-Riemannian geometry in M 2 .Fig. 4: Association field lines from neurogeometry [37,Fig.43],[39,Fig.16].Such association field lines can be well approximated by spatially projected sub-Riemannian geodesics in M 2 [37,38,40,41], [42,Fig.17The relation between association fields and Riemannian geometry on M 2 directly extends to a relation between dilation/erosion and association fields.Namely, performing dilation on an orientation score in M 2 is similar to extending a line segment along its association field lines.Similarly, performing erosion is similar to sharpening a line segment perpendicular to its association field lines.This makes dilation/erosion the perfect candidate for a task such as line completion.Fig. 6: One sample of the Lines dataset.In 6a we see the input, in 6b the perceived curve that we consider as ground-truth (as the input is constructed by interrupting the ground-truth line and adding random local orientations).
In the line completion problem, the input is an image containing multiple line segments, and the desired output is an image of the line that is "hidden" in the input image.Figure 6 shows such an input and desired output.This is also what David Field et al. studied in [39].We anticipate that PDE-G-CNNs outperform classical CNNs in the line completion problem due to PDE-G-CNNs being able to dilate and erode.
To investigate this we made a synthetic dataset called "Lines" consisting of grayscale 64 × 64 pixel images, together with their ground-truth line completion.In Figure 7 a complete abstract overview of the architecture of a PDE-G-CNN performing line completion is visualized.Figure 8 illustrates how a PDE-G-CNN and CNN incrementally complete a line throughout their layers.
In Proposition 1 we show that solving the dilation and erosion PDEs can be done by performing a morphological convolution with a morphological kernel k α t : M 2 → R ≥0 , which is easily expressed in the Riemannian distance d = d G on the manifold: Here p 0 = (0, 0, 0) is our reference point in M 2 , and time t > 0 controls the amount of erosion and dilation.Furthermore, α > 1 controls the "softness" of the max and min-pooling, with 1 α + 1 β = 1.Erosion is done through a direct morphological convolution (3) with this specific kernel.Dilation is solved in a slightly different way but again with the same kernel (Proposition 1 in Section 3 will explain the details).1).The feature maps of the PDE-G-CNN live in M 2 , but for clarity we only show the max-projection over θ.Within the feature maps of the PDE-G-CNN association fields from neurogeometry [37,39,46] become visible as network depth increases.Such merging of association fields is not visible in the feature maps of the CNN.This observation is consistent throughout different inputs.
And this is where a problem arises: calculating the exact distance d on M 2 required in (1) is computationally expensive [47].To alleviate this issue, we resort to estimating the true distance d with computationally efficient approximative distances, denoted throughout the article by ρ.We then use such a distance approximation within (1) to create a corresponding approximative morphological kernel, and in turn use this to efficiently calculate the effect of dilation and erosion.
In [28] one such distance approximation is used: the logarithmic distance estimate ρ c which uses the logarithmic coordinates c i (8).In short, ρ c (p) is equal to the Riemannian length of the exponential curve that connects p 0 to p.The formal definition will follow in Section 4. In Figure 9 an impression of ρ c is given.
Clearly, an error is made when the effect of erosion and dilation is calculated with an approximative morphological kernel.As a morphological Fig. 9: A visualization of ρ c , similar to Figure 5.In 9a we see multiple contours of ρ c , and on the bottom we see the min-projection over θ.The metric parameters are (w 1 , w 2 , w 3 ) = (1, 4, 1).In 9b we see the same min-projection together with some corresponding spatially projected exponential curves.Note the similarity to Figure 4. kernel is completely determined by its corresponding (approximative) distance, it follows that one can analyse the error by analyzing the difference between the exact distance d and approximative distance ρ that is used.
Despite showing in [28] that d ≤ ρ c no concrete bounds are given, apart from the asymptotic ρ 2 c ≤ d 2 + O(d 4 ).This motivates us to do a more in-depth analysis on the quality of the distance approximations.
We introduce a variation on the logarithmic estimate ρ c called the half-angle distance estimate ρ b , and analyse that.The half-angle approximation uses not the logarithmic coordinates but half-angle coordinates b i .The definition of these is also given later (28).In practice ρ c and ρ b do not differ much, but analysing ρ b is much easier!
The main theorem of the paper, Theorem 1, collects new theoretical results that describe the quality of using the half-angle distance approximation ρ b for solving dilation and erosion in practice.It relates the approximative morphological kernel k b corresponding with ρ b , to the exact kernel k (1).
Both the logarithmic estimate ρ c and halfangle estimate ρ b approximate the true Riemannian distance d quite well in certain cases.One of these cases is when the Riemannian metric has a low spatial anisotropy ζ.We can show this visually by comparing the isocontours of the exact and approximative distances.However, interpreting and comparing these surfaces can be difficult.This  1 and 2.
is why we have decided to additionally plot multiple θ-isocontours of these surfaces.In Figure 10 one such plot can be seen, and illustrates how it must be interpreted.
In Table 1 a spatially isotropic ζ = 1 and lowanisotropic case ζ = 2 is visualized.Note that ρ b approximates d well in these cases.In fact, ρ b is exactly equal to the true distance d in the spatially isotropic case, which is not true for ρ c .
Both the logarithm and half-angle approximation fail specifically in the high spatial anisotropy regime.For example when ζ = 8.The first two columns of Table 2 show that, indeed, ρ b is no longer a good approximation of the exact distance d.For this reason we introduce a novel sub-Riemannian distance approximations ρ b,sr , which is visualized in the third column of Table 2.
Finally, we propose an approximative distance ρ com that carefully combines the Riemannian and sub-Riemannian approximations into one.This combined approximation automatically switches to the estimate that is more appropriate depending on the spatial anisotropy, and hence covers both the low and high anisotropy regimes.Using the corresponding morphological kernel of ρ com to solve erosion and dilation we obtain more accurate (and still tangible) solutions of the non-linear parts in the PDE-G-CNNs.
For every distance approximation (listed in Section 4) we perform an empirical analysis in Section 6 by seeing how the estimate changes the performance of the PDE-G-CNNs when applied to two datasets: the Lines dataset and publicly available DCA1 dataset.

Contributions
In Proposition 1 we summarize how the nonlinear units in PDE-G-CNNs (described by morphological PDEs) are solved using morphological kernels and convolutions, which provides sufficient and essential background for the discussions and results in this paper.
The key contributions of this article are: • Theorem 1 summarizes our mathematical analysis of the quality of the half-angle distance approximation ρ b and its corresponding morphological kernel k b in PDE-G-CNNs.We do this by comparing k b to the exact morphological kernel k.Globally, one can show that they both carry the same symmetries, and that for low spatial anisotropies ζ they are almost indistinguishable.Furthermore, we show that locally both kernels are similar through an upper bound on the relative error.This improves upon results in [28,Lem.20].• Table 2 demonstrates qualitatively that ρ b becomes a poor approximation when the spatial anisotropy is high ζ 1.In Corollary 4 we underpin this theoretically and in Section 6.1 we validate this observation numerically.This motivates the use of a sub-Riemannian approximation when ζ is large.
• In Section 4 we introduce and derive a novel sub-Riemannian distance approximation ρ sr , that overcomes difficulties in previous existing sub-Riemannian kernel approximations [48].Subsequently, we propose our approximation ρ com that combines the Riemannian and sub-Riemannian approximations into one that automatically switches to the approximation that is more appropriate depending on the metric parameters.

• Figures 16 and 19 shows that PDE-G-CNNs
perform just as well as, and sometimes better than, G-CNNs and CNNs on the DCA1 and Lines dataset, while having the least amount of parameters.Figures 17 and 20 depict an evaluation of the performance of PDE-G-CNNs when using the different distance approximations, again on the DCA1 and Lines dataset.We observe that the new kernel ρ b,com provides best results.
In addition, Figures 4, 5, 8 and 9 show a connection between the PDE-G-CNN framework with the theory of association fields from neurogeometry [37,39].Thereby, PDE-G-CNNs reveal improved geometrical interpretability, in comparison to existing convolution neural networks.In Appendix B we further clarify the geometrical interpretability.

Outline
In Section 2 a short overview of the necessary mathematical preliminaries is given.Section 3 collects some known results on the exact solution of erosion and dilation on the homogeneous space of two-dimensional positions and orientations M 2 , and motivates the use of morphological kernels.In Section 4 all approximative distances are listed.The approximative distances give rise to corresponding approximative morphological kernels.The main theorem of this paper can be found in Section 5 and consist of three parts, of which the proofs can be found in the relevant subsections.The main theorem mostly concerns itself with the analysis of the approximative morphological kernel k b .Experiments with various approximative kernels are done and the result can be found in Section 6.Finally, we end the paper with a conclusion in Section 7.

Preliminaries
Coordinates on SE(2) and M 2 .Let G = SE(2) = R 2 SO(2) be the two-dimensional rigid body motion group.We identify elements  Table 2: The same as Table 1 but in the high spatially anisotropic case.Alongside the approximation ρ b the sub-Riemannian distance approximation ρ b,sr is plotted with ν = 1.6.We see that the isocontours of ρ b are too "thin" compared to the isocontours of d.The isocontours of ρ b,sr are better in this respect.
we always use the small-angle identification R/(2πZ) = [−π, π). For and the identity is e = (0, 0, 0).The rigid body motion group acts on the homogeneous space of two-dimensional positions and orientations 2) and (y, n) ∈ M 2 .If context allows it we may omit writing for conciseness.By choosing the reference element p 0 = (0, 0, (1, 0)) ∈ M 2 we have: This mapping is a diffeomorphism and allows us to identify SE(2) and M 2 .Thereby we will also freely use the (x, y, θ) coordinates on M 2 .Morphological group convolution.Given functions f 1 , f 2 : M 2 → R we define their morphological convolution (or 'infimal convolution') [50,51] by Left-invariant (co-)vector fields on M 2 .
Throughout this paper we shall rely on the following basis of left-invariant vector fields: The dual frame ω i is given by ω i , A j = δ i j , i.e: ω 1 = cos θdx + sin θdy, ω 2 = − sin θdx + cos θdy, and Metric tensor fields on M 2 .We consider the following left-invariant metric tensor fields: and write ṗ = G p ( ṗ, ṗ).Here w i > 0 are the metric parameters.We also use the dual norm p * = sup ṗ∈TpM2 ṗ,p ṗ .We will assume, without loss of generality, that w 2 ≥ w 1 and introduce the ratio that is called the spatial anisotropy of the metric.
where Γ t (p, q) is the set piecewise C 1 -curves γ in M 2 with γ(0) = p and γ(t) = q.The right-hand side does not depend on t > 0, and we may set t = 1.
If no confusion can arise we omit the subscript G and write d, L, • for short.The distance being left-invariant means that for all g ∈ SE(2), p 1 , p 2 ∈ M 2 one has d(p, q) = d(gp, gq).We will often use the shorthand notation d(p) := d(p, p 0 ).
We often consider the sub-Riemannian case arising when w 2 → ∞.Then we have "infinite cost" for sideways motion and the only "permissible" curves γ are the ones for which γ(t) ∈ H where H := span{A 1 , A 3 } ⊂ T M 2 .This gives rise to a new notion of distance, namely the sub-Riemannian distance d sr :  2) is given by: And the logarithm: log(x, y, θ By virtue of equation ( 2) we will freely use the logarithm coordinates on M 2 .

Erosion and Dilation
We will be considering the following Hamilton-Jacobi equation on M 2 : with the Hamiltonian H α : T * M 2 → R ≥0 : and where W α the viscosity solutions [52] obtained from the initial condition U ∈ C(M 2 , R).Here the +sign is a dilation scale space and the −sign is an erosion scale space [50,51].If confusion cannot arise, we omit the superscript 1D.Erosion and dilation correspond to min-and max-pooling, respectively.The Lagrangian L α : T M 2 → R ≥0 corresponding with this Hamiltonian is obtained by taking the Fenchel transform of the Hamiltonian: Again, if confusion cannot arise, we omit the subscript α and/or superscript 1D.We deviate from our previous work by including the factor 1 α and working with a power of α instead of 2α.We do this because it simplifies the relation between the Hamiltonian and Lagrangian.
The following proposition collects standard results in terms of the solutions of Hamilton-Jacobi equations on manifolds [53][54][55], thereby generalizing results on R 2 to M 2 .

Proposition 1 (Solution erosion & dilation)
Let α > 1.The viscosity solution Wα of the erosion PDE (9) is given by where the morphological kernel k α t : M 2 → R ≥0 is defined as: Furthermore, the Riemannian distance d := d(p 0 , •) is the viscosity solution of the eikonal PDE with boundary condition d(p 0 ) = 0. Likewise the viscosity solution of the dilation PDE is Proof It is shown by Fathi in [54, Prop.5.3] that ( 10) is a viscosity solution of the Hamilton-Jacobi equation ( 9) on a complete connected Riemannian manifold without boundary, under some (weak) conditions on the Hamiltonian and with the initial condition U being Lipschitz.In [53, Thm.2] a similar statement is given but only for compact connected Riemannian manifolds, again under some weak conditions on the Hamiltonian but without any on the initial condition.Next, we employ these existing results and provide a self-contained proof of ( 11) and ( 12).Because we are looking at a specific class of Lagrangians, the solutions can be equivalently written as (11).In [53,Prop.2]this form can also be found.Namely, the Lagrangian L 1D α is convex for α > 1, so for any curve γ ∈ Γ t := Γ t (p, q) we have by direct application of Jensen's inequality (omitting the superscript 1D): with equality if γ is constant.This means that: Lα( γ(s) ) ds, (16) where L(γ) := L G (γ), recall (6), is the length of the curve γ.Consider the subset of curves with constant speed Γt = {γ ∈ Γ t | γ = L(γ)/t} ⊂ Γ t .Optimizing over a subset can never decrease the infimum so we have: The r.h.s of this equation is equal to the l.h.s of equation ( 16) as the length of a curve is independent of its parameterization.Thereby we have equality in (16).By monotonicity of Lα on R >0 we may then conclude that: That we can write the solution as ( 12) is a consequence of the left-invariant metric on the manifold.A similar derivation can be found in [28,Thm.30]: It is shown in [55,Thm.6.24] for complete connected Riemannian manifolds that the distance map d(p) is a viscosity solution of the Eikonal equation (14).
Finally, solutions of erosion and dilation PDEs correspond to each other.If Wα is the viscosity solution of the erosion PDE with initial condition U , then −Wα is the viscosity solution of the dilation PDE, with initial condition −U .This means that the viscosity solution of the dilation PDE is given by (15).

Distance Approximations
To calculate the morphological kernel k α t (13) we need the the exact Riemannian distance d (6), but calculating this is computationally demanding.To alleviate this problem we approximate the exact distance d(p 0 , •) with approximative distances, denoted with ρ : M 2 → R ≥0 , which are computationally cheap.To this end we define the logarithmic distance approximation ρ c , as explained in [28,Def.19]and [56,Def.6.1.2],by Note that all approximative distances ρ can be extended to something that looks like a metric on M 2 .For example we can define: But this is almost always not a true metric in the sense that it does not satisfy the triangle inequality.So in this sense an approximative distance is not necessarily a true distance.However, we will keep referring to them as approximative distances as we only require them to look like the exact Riemannian distance d(p 0 , •).
As already stated in the introduction, Riemannian distance approximations such as ρ c begin to fail in the high spatial anisotropy cases ζ 1.For these situations we need sub-Riemannian distance approximations.In previous literature two such sub-Riemannian approximations are suggested.The first one is standard [57,Sec.6], the second one is a modified smooth version [29, p.284], also seen in [48, eq.14]: In [48] ν ≈ 44 is empirically suggested.Note that the sub-Riemannian approximations rely on the assumption that w 2 ≥ w 1 .However, they both suffer from a major shortcoming in the interaction between w 3 and c 2 .When we let w 3 → 0 both approximations suggest that it becomes arbitrarily cheap to move in the c 2 direction which is undesirable as this deviates from the exact distance d: moving spatially will always have a cost associated with it determined by at least w 1 .
To make a proper sub-Riemannian distance estimate we will use the Zassenhaus formula, which is related to the Baker-Campbell-Hausdorff formula: where we have used the shorthand e x := exp(x).
Filling in X = A 1 and Y = A 3 and neglecting the higher order terms gives: or equivalently: 2 A2 ≈ e −tA3 e −tA1 e t(A1+A3) .
This formula says that one can successively follow exponential curves in the "legal" directions A 1 and A 3 to effectively move in the "illegal" direction of A 2 .Taking the lengths of these curves and adding them up gives an approximative upper bound on the sub-Riemannian distance: Substituting t → 2 |t| gives: This inequality, together with the smoothing trick to go from (18) to (19), inspires then the following sub-Riemannian distance approximation: for some 0 < ν < 2 √ 2 s.t. the approximation is tight.We empirically suggest ν ≈ 1.6, based on a numerical analysis that is tangential to [48,Fig.3].Notice that this approximation does not break down when we let w 3 → 0.
Table 3 shows that both the old sub-Riemannian approximation (19) and new approximation (25) are appropriate in cases such as w 3 = 1.Table 4 shows that the old approximation breaks down when we take w 3 = 0.5, and that the new approximation behaves more appropriate.
The Riemannian and sub-Riemannian approximations can be combined into the following newly proposed practical approximation: ρ c,com := max(l, min(ρ c,sr , ρ c )), (26) where l : M 2 → R is given by: for which will we show that it is a lower bound of the exact distance d in Lemma 4.
The most important property of the combined approximation is that is automatically switches between the Riemannian and sub-Riemannian approximations depending on the metric parameters.Namely, the Riemannian approximation is appropriate very close to the reference point p 0 , but tends to overestimate the true distance at a moderate distance from it.The sub-Riemannian approximation is appropriate at moderate distances from p 0 , but tends to overestimate very close to it, and underestimate far away.The combined approximation is such that we get rid of the weaknesses that the approximations have on their own.
On top of these approximative distances, we also define ρ b , ρ b,sr , and ρ b,com by replacing the logarithmic coordinates c i by their corresponding half-angle coordinates b i defined by: So, for example, we define ρ b as: Why we use these coordinates will be explained in Section 5.1.We can define approximative morphological kernels by replacing the exact distance in (13) by any of the approximative distances in this section.To this end we, for example, define k b by replacing the exact distance in the morphological kernel k by ρ b : where we recall that 1 α + 1 β = 1 and α > 1.   (19) and new approximation ρ b,sr (25).For the old approximation we chose ν = 44, as suggested in [48], and for the new one ν = 1.6.We see that in this case both approximations are appropriate.3 but then with w 1 = 1, w 2 = 8, w 3 = 0.5.We see that in this case that the old sub-Riemannian approximation ρ b,sr,old (19) underestimates the true distance and becomes less appropriate.
The new approximation (25) is also not perfect but qualitatively better.Decreasing w 3 would exaggerate this effect even further.

Main Theorem and Analysis
When the effect of erosion and dilation is calculated with an approximative morphological kernel an error is made.We are are therefor interested in analyzing the behaviour of this error.We do this by comparing the approximative morphological kernels with the exact kernel k α t (13).The result of our analysis is summarized in the following theorem.Because there is no time t dependency in all the inequalities of our main result we use short notation Theorem 1 (Quality of approximative morphological kernels) Let ζ := w2 w1 denote the spatial anisotropy, and let β be such that 1 α + 1 β = 1, for some α > 1 fixed.We assess the quality of our approximative kernels in three ways: • The exact and all approximative kernels have the same symmetries, see Table 5. • Globally it holds that: from which we see that in the case ζ = 1 we have that k α b is exactly equal to k α .• Locally around 1 p 0 we have: where Proof The proof of the parts of the theorem will be discussed throughout the upcoming subsections.
• The symmetries are shown in Corollary 1.
• The global bound ( 31) is shown in Corollary 3.
• The local bound (32) is shown in Corollary 5.
Clearly, as all approximative kernels are solely functions of the corresponding approximative distances, the analysis of the quality of an approximative kernel reduces to analysing the quality of the approximative distance that is used, and this is exactly what we will do.
1 for a precise statement see Lemma 7 and Remark 3.
In previous work on PDE-G-CNN's the bound d = d(p 0 , •) ≤ ρ c is proven [28,Lem.20].Furthermore, it is shown that around p 0 one has: which has the corollary that there exist a constant C ≥ 1 such that for any compact neighbourhood around p 0 .We improve on these results by; • Showing that the approximative distances have the same symmetries as the exact Riemannian distance; Lemma 3. • Finding simple global bounds on the exact distance d which can then be used to find global estimates of ρ b by d; Lemma 4. This improves upon (35) by finding an expression for the constant C. • Estimating the leading term of the asymptotic expansion, and observing that our upper bound of the relative error between ρ b and d explodes in the cases ζ → ∞ and w 3 → 0; Lemma 7.This improves upon equation (34).
Note however that we are not analysing ρ c : we will be analysing ρ b .This is mainly because the half-angle coordinates are easier to work with: they do not have the sinc θ 2 factor the logarithmic coordinates have.Using that recall ( 28) and ( 8), we see that and thus locally ρ c and ρ b do not differ much, and results on ρ b can be easily transferred to (slightly weaker) results on ρ c .
(37) They generate the other 4 symmetries as follows: and with ε 0 = id.All symmetries are involutions: Henceforth all eight symmetries will be called 'fundamental symmetries'.How all fundamental symmetries relate to each other becomes clearer if we write them down in either logarithm or half-angle coordinates.
Lemma 1 (8 fundamental symmetries) The 8 fundamental symmetries ε i , in either halfangle coordinates b i or logarithmic coordinates c i , correspond to sign flips as laid out in Table 5.
Proof We will only show that ε 2 flips b 1 .All other calculations are done analogously.Pick a point p = (x, y, θ) and let q = ε 2 (p).We now calculate b 1 (q): where we have used the trigonometric difference identities of cosine and sine in the second-to-last equality.
From the relation between logarithmic and half-angle coordinates (36) we have that the logarithmic coordinates c i flip in the same manner under the symmetries.
The fixed points of the symmetries ε 2 , ε 1 , and ε 6 have an interesting geometric interpretation.The logarithmic and half-angle coordinates, being so closely related to the fundamental symmetries, also carry the same interpretation.Definition 1 introduces this geometric idea and Lemma 2 makes its relation to the fixed points of the symmetries precise.In Figure 11 the fixed points are visualized, and in Figure 12 a visualization of these geometric ideas can be seen.
Definition 1 Two points p 1 = (x 1 , n 1 ), p 2 = (x 2 , n 1 ) of M 2 are called cocircular if there exist a circle, of possibly infinite radius, passing through x 1 and x 2 such that the orientations n 1 ∈ S 1 and n 2 ∈ S 1 are tangents to the circle, at respectively x 1 and x 2 , in either both the clockwise or anti-clockwise direction.Similarly, the points are called coradial if the orientations are normal to the circle in either both the outward or inward direction.Finally, two points are called parallel if their orientations coincide.
Co-circularity has a well-known characterisation that is often used for line enhancement in image processing, such as tensor voting [58].
In fact all fixed points of the fundamental symmetries can be intuitively characterised: 40) Fig. 11: The fixed points of the ε 2 , ε 1 , and ε 6 .For ε 2 and ε 1 only the points within the region x 2 + y 2 ≤ 2 2 are plotted.For ε 6 only the points in the region max(|x| , |y|) ≤ 2. The fixed points of ε 2 , ε 1 , and ε 6 correspond respectively to the points in M 2 that are coradial, cocircular, and parallel to the reference point p 0 .Proof We will only show (40), the others are done analogously.We start by writing g = (r cos φ, r sin φ, θ) and calculating that g p 0 = (r cos φ, r sin φ, (cos θ, sin θ)).
Then by Remark 1 we known that gp 0 is cocircular to p 0 if and only if 2φ = θ mod 2π.We can show this is equivalent to c 2 (g) = 0: c 3 ) from which we may deduce that ε 1 (g) = g is equivalent to c 2 (g) = 0.If c 2 (g) = 0 then log g ∈ A 1 , A 3 and thus g ∈ exp( A 1 , A 3 ).As for the other way around, it holds by simple computation that: In the important work [44] on sub-Riemannian geometry on SE(2) by Sachkov and Moiseev, it is shown that the exact sub-Riemannian distance d sr is invariant under the fundamental symmetries ε i .However, these same symmetries hold true for the Riemannian distance d.Moreover, because the approximative distances use the logarithmic coordinates c i and half-angle coordinates b i they also carry the same symmetries.The following lemma makes this precise.
Lemma 3 (Symmetries of the exact distance and all proposed approximations) All exact and approximative (sub)-Riemannian distances (w.r.t. the reference point p 0 ) are invariant under all the fundamental symmetries ε i .
Proof By Table 5 one sees that ε 3 , ε 4 , and ε 5 also generate all symmetries.Therefore if we just show that all distances are invariant under these select three symmetries we also have shown that they are invariant under all symmetries.We will first show the exact distance, in either the Riemannian or sub-Riemannian case, is invariant w.r.t.these three symmetries, i.e. d(p) = d(ε i (p)) for i ∈ {3, 4, 5}.
That all approximative distances (both in the Riemannian and sub-Riemannian case) are also invariant under all the symmetries is not hard to see: every b i and c i term is either squared or the absolute value is taken.Flipping signs of these coordinates, recall Lemma 1, has no effect on the approximative distance.

Corollary 1 (All kernels preserve symmetries)
The exact kernel and all approximative kernels have the same fundamental symmetries.
In Figure 10 the previous lemma can be seen.The two fundamental symmetries ε 2 and ε 1 correspond, respectively, to reflecting the isocontours (depicted in colors) along their short edge and long axis.The ε 6 symmetry corresponds to mapping the positive θ isocontours to their negative θ counterparts.In Figure 13 one can see an isocontour of ρ b together with the symmetry "planes" of ε 2 , ε 1 and ε 6 .Fig. 13: In grey the isocontour ρ b = 2.5, together with the symmetry "planes" of ε 2 , ε 1 and ε 6 , as also plotted in Figure 11.The metric parameters are (w 1 , w 2 , w 3 ) = (1, 2, 1).

Simple Global Bounds
Next we provide some basic global lower and upper bounds for the exact Riemannian distance d (6).Recall that the lower bound l plays an important role in the combined approximation ρ c,com (26) when far from the reference point p 0 .

Lemma 4 (Global bounds on distance)
The exact Riemannian distance d = d(p 0 , •) is greater than or equal to the following lower bound l : M 2 → R: and less than or equal to the following upper bounds u 1 , u 2 : M 2 → R: Proof We will first show l ≤ d.Consider the following spatially isotropic metric: . We assumed w.l.o.g. that w 1 ≤ w 2 so we have for any vector v ∈ T M 2 that v G ≤ v G .From this we can directly deduce that for any curve γ on M 2 we have that L G (γ) ≤ L G (γ). Now consider a lengthminimizing curve γ w.r.t.G between the reference point p 0 and some end point p.We then have the chain of (in)equalities: Furthermore, because the metric G is spatially isotropic it can be equivalently be written as: G = w 2 1 dx ⊗ dx + w 2 1 dy ⊗ dy + w 2 3 dθ ⊗ dθ, which is a constant metric on the coordinate covector fields, and thus: Putting everything together gives the desired result of l ≤ d.To show that d ≤ u 1 can be done analogously.
As for showing d ≤ u 2 we will construct a curve γ of which the length L(γ) w.r.t.G can be bounded from above with u 2 .This in turn shows that d ≤ u 2 by definition of the distance.Pick a destination position and orientation p = (x, n).The constructed curve γ will be as follows.We start by aligning our starting orientation n 0 = (1, 0) ∈ S 1 towards the destination position x.This desired orientation towards x is x := x r where r = x = x 2 + y 2 .This action will cost w 3 a for some a ≥ 0. Once we are aligned with x we move towards x.Because we are aligned this action will cost w 1 r.Now that we are at x we align our orientation with the destination orientation n, which will cost w 3 b for some b ≥ 0. Altogether we have L(γ) = w 1 r + w 3 (a + b).In its current form the constructed curve actually doesn't have that a + b ≤ π as desired.To fix this we realise that we did not necessarily had to align with x.We could have aligned with −x and move backwards towards x, which will also cost w 1 r.One can show that one of the two methods (either moving forwards or backwards towards x) indeed has that a + b ≤ π and thus d ≤ u 2 .
These bounds are simple but effective: they help us prove a multitude of insightful corollaries.

Corollary 2 (Global error distance)
Simple manipulations, together with the fact that x 2 + y 2 = (b 1 ) 2 + (b 2 ) 2 , give the following inequalities between l, u 1 and ρ b : The second equation can be extended to inequalities between ρ b and d: Remark 2 If w 1 = w 2 ⇔ ζ = 1, i.e. the spatially isotropic case, then the lower and upper bound coincide, thus becoming exact.Because ρ b is within the lower and upper bound it also becomes exact.

Corollary 3 (Global error kernel)
Globally the error is independent of time t > 0 and is estimated by the spatial anisotropy ζ ≥ 1 (5) as follows: Proof We will only prove the second inequality, the first is done analogously.
The previous result indicates that problems can arise if ζ → ∞, which indeed turns out to be the case: Corollary 4 (Observing the problem) If we restrict ourselves to x = θ = 0 we have that u 1 = ρ b = ρc = w 2 |y|.From this we can deduce that one can be certain that both ρ b and ρc become bad approximations away from p 0 .Namely, when ζ > 1 ⇔ w 2 > w 1 both approximations go above u 2 if one looks far enough away from p 0 .How "fast" it goes bad is determined by all metric parameters.Namely, the intersection of the approximations ρ b and ρc, and u 2 is at |y| = w3π w2−w1 , or equivalently at ρ = w3π 1−ζ −1 .This intersection is visible in Figure 14 in the higher anisotropy cases.From this expression of the intersection we see that in the cases w 3 → 0 and ζ → ∞ the Riemannian distance approximations ρ b and ρc quickly go bad.We will see exactly the same behaviour in Lemma 7 and Remark 3.
Lemma 4 is visualized in Figures 14 and 15.In Figure 14 figure we consider the behavior of the exact distance and bounds along the y-axis, that is at x = θ = 0. We have chosen to inspect the yaxis because it consist of points that are hard to reach from the reference point p 0 when the spatial anisotropy is large, which makes it interesting.In contrast, along the x-axis l, d, ρ b , ρ c , u 1 and w 1 |x| all coincide, and is therefor uninteresting.To provide more insight we also depict the bounds along the y = x axis, see Figure 15.Observe that in both figures, the exact distance d is indeed always above the lower bound l and below the upper bounds u 1 and u 2 .Fig. 14: Exact distance and its lower and upper bounds (given in Lemma 4) along the y-axis, i.e at x = θ = 0, for increasing spatial anisotropy.We keep w 1 = w 3 = 1 and vary w 2 .The horizontal axis is y and the vertical axis the value of the distance/bound.Note how the exact distance d starts of linearly with a slope of w 2 , and ends linearly with a slope of w 1 .

Asymptotic Error Expansion
In this section we provide an asymptotic expansion of the error between the exact distance d and the half-angle distance approximation ρ b (Lemma 7).This error is then leveraged to an error between the exact morphological kernel k and the half-angle kernel k b (Corollary 5).We also give a formula that determines a region for which the half-angle approximation ρ b is appropriate given an a priori tolerance bound (Remark 3).
Lemma 5 Let γ : [0, 1] → M 2 be a minimizing geodesic from p 0 to p.We have that: Proof The fundamental theorem of calculus tells us that: but one can also bound this expression as follows: Putting the two together gives the desired result.
Lemma 6 One can bound dρ b around p 0 by: Proof The proof is deferred to Appendix A By combining the simple Lemmas 5 and 6 one can find an expression for the asymptotic error between the exact distance d and the half-angle approximation ρ b .
Lemma 7 Around any compact neighbourhood of p 0 we have that for some C ≥ 0.
Proof Let p ∈ U be given, and let γ : [0, 1] → M 2 be the geodesic from p 0 to p.For the distance we know that d(γ(s)) ≤ d(γ(t)), for s ≤ t.
Making use of (42) we know that 1 ζ ρ b ≤ d ≤ ζρ b so we can combine this with the previous equation to find: from which we get that max Combining this fact with the above two lemmas allows us to conclude (43).
Remark 3 (Region for approximation ρ b ≈ d) Putting an a priori tolerance bound ε tol on the error ε (and neglecting the O(θ 3 ) term) gives rise to a region Ω 0 on which the local approximation ρ b is appropriate: Thereby we can not guarantee a large region of acceptable relative error when w 3 → 0 or ζ → ∞.We solve this problem by using ρ b,com given (26) instead of ρ b .

Corollary 5 (Local error morphological kernel)
Locally around p 0 we have: Proof By Lemma 7 one has 6 Experiments

Error of Half Angle Approximation
We can quantitatively analyse the error between any distance approximation ρ and the exact Riemannian distance d as follows.We do this by first choosing a region Ω ⊆ M 2 in which we will analyse the approximation.Just as in Tables 1 and 2 we decided to inspect As for our exact measure of error ε we have decided on the mean relative error defined as: where µ is the induced Riemannian measure determined by the Riemannian metric G.We then discretized our domain Ω into a grid of 101 × 101 × 101 equally spaced points p i ∈ Ω indexed by some index set i ∈ I and numerically solved for the exact distance d on this grid.This numerical scheme is of course not exact and we will refer to these values as di ≈ d(p i ).We also calculate the value of the distance approximation ρ on the grid points ρ i := ρ(p i ).Once we have these values we can approximate the true mean relative error ε by calculating the numerical error ε defined by: In Table 6 the numerical mean relative error ε between the half-angle approximation ρ b and the numerical Riemannian distance d can be seen for different spatial anisotropies ζ.We keep w 1 = w 3 = 1 constant and vary w 2 .We see that, as shown visually in Tables 1 and 2, that ρ b gets worse and worse when we increase the spatial anisotropy ζ.
There is an discrepancy in the table worth mentioning.We know from Remark 2 that when ζ = 1 then ρ b = d and thus ε = 0.But surprisingly we do not have ε = 0 in the ζ = 1 case in Table 6.This can be simply explained by the fact that the numerical solution d is not exactly equal to the true distance d.We expect that ε will go to 0 in  the ζ = 1 case if we discretize our region Ω more and more finely.We can compare these numerical results to our theoretical results.Namely, we can deduce from Equation ( 42) that: which means ε ≤ ζ − 1.
(47) And so we expect this to also approximately hold for the numerical mean relative error ε.Indeed, in Table 6 we can see that ε ζ − 1.
Interestingly, we see that ε is relatively small compared to our theoretical bound (47) even in the high anisotropy cases.However, this is only a consequence of relative smallness of Ω.If we make Ω bigger and bigger we can be certain that ε converges to ζ − 1.This follows from an argument similar to the reasoning in Corollary 4.

DCA1
The DCA1 dataset is a publicly available database "consisting of 130 X-ray coronary angiograms, and their corresponding ground-truth image outlined by an expert cardiologist" [59].One such angiogram and ground-truth can be seen in Figures 18a and 18d.
We have split the DCA1 dataset [59] into a training and test set consisting of 125 and 10 images respectively.
To establish a baseline we ran a 3, 6, and 12 layer CNN, G-CNN and PDE-G-CNN on DCA1.The exact architectures are identical/analogous to the ones used in [28,Fig.15].For the baseline the logarithmic distance approximation ρ c was used within the PDE-G-CNNs.This is the same approximation that was used in [28].Every network was trained 10 times for 80 epochs.After every epoch the average Dice coefficient on the test set is stored.After every full training the maximum of the average Dice coefficients over all 80 epochs is calculated.The result is 10 maximum average Dice coefficients for every architecture.7.
The result of this baseline can be seen in Figure 16.The amount of parameters of the networks can be found in Table 7.We see that PDE-G-CNNs consistently perform equally well as, and sometimes outperform, G-CNNs and CNNs, all the while having the least amount of parameters of all architectures.
To compare the effect of using different approximative distances we decided to train the 6 layer PDE-G-CNN (with 2560 parameters) 10 times for 80 epochs using each distance approximation.The results can be found in Figures 17 and 18.We see that on DCA1 all distance approximations have a comparable performance.We notice a small dent in effectiveness when using ρ b,sr , and a small increase when using ρ b,com .

Lines
For the line completion problem we created a dataset of 512 training images and 128 test images2 .Figures 21a and 21d show one sample of the Lines dataset.To establish a baseline we ran a 6 layer CNN, G-CNN and PDE-G-CNN.For this baseline we again used ρ c within the the PDE-G-CNN, but changed the amount of channels to 30, and the kernel sizes to [9,9,9], making the total amount of parameters 6018.By increasing the kernel size we anticipate that the difference in effectiveness    8.
of using the different distance approximations, if there is any, becomes more pronounced.Every network was trained 15 times for 60 epochs.The result of this baseline can be seen in Figure 19.
The amount of parameters of the networks can be found in Table 8.We again see that the PDE-G-CNN outperforms the G-CNN, which in turn outperforms the CNN, while having the least amount of parameters.We again test the effect of using different approximative distances by training the 6 layer PDE-G-CNN 15 times for 60 epochs for every approximation.The results can be found in Figure 20.We see that on the Lines dataset all distance approximations again have a comparable performance.We again notice an increase in effectiveness when using ρ b,com , just as on the DCA1 dataset.Interestingly, using ρ b,sr does not seem to hurt the performance on the Lines dataset, which is in contrast with DCA1.This is in line with what one would expect in view of the existing sub-Riemannian line-perception models in neurogeometry.

Conclusion
In this article we have carefully analyzed how well the nonlinear erosion and dilation parts of PDE-G-CNNs are actually solved on the homogeneous space of 2D positions and orientations M 2 .
According to Proposition 1 the Hamilton-Jacobi equations are solved by morphological kernels that are functions of only the exact (sub)-Riemannian distance function.As a result, every approximation of the exact distance yields a corresponding approximative morphological kernel.
In Theorem 1 we use this to improve upon local and global approximations of the relative errors of the erosion and dilations kernels used in the papers [28,60] where PDE-G-CNN are first proposed (and shown to outperform G-CNNs).Our new sharper estimates for distance on M 2 have bounds that explicitly depend on the metric tensor field coefficients.This allowed us to theoretically underpin the earlier worries expressed in [28,Fig.10]that if spatial anisotropy becomes high the previous morphological kernel approximations [28] become less and less accurate.
Indeed, as we show qualitatively in Table 2 and quantitatively in Section 6.1, if the spatial anisotropy ζ is high one must resort to sub-Riemannian approximations.Furthermore, we propose a single distance approximation ρ b,com that works both for low and high spatial anisotropy.
Apart from how well the kernels approximate the PDEs, there is the issue of how well each of the distance approximations perform in applications within the PDE-G-CNNs.In practice the analytic approximative kernels using ρ b , ρ c , ρ b,com perform similarly.This is not surprising as our theoretical result Lemma 3 and corollary 1 reveals that all morphological kernel approximations carry the correct 8 fundamental symmetries of the PDE.Nevertheless, Figures 17 and 20 do reveal advantages of using the new kernel approximations (in particular ρ b,com ) over the previous kernel ρ c in [28].
The experiments also show that the strictly sub-Riemannian distance approximation ρ b,sr only performs well on applications where sub-Riemannian geometry really applies.For instance, as can be seen in Figures 17 and 20, on the DCA1 dataset ρ b,sr performs relatively poor, whereas on the Lines dataset, ρ b,sr performs well.This is what one would expect in view of sub-Riemannian models and findings in cortical line-perception [37,38,40,41,46,61] in neurogeometry.
Besides better accuracy and better performance of the approximative kernels, there is the issue of geometric interpretability.In G-CNNs and CNNs geometric interpretability is absent, as they include ad-hoc nonlinearities like ReLUs.PDE-G-CNNs instead employ morphological convolutions with kernels that reflect association fields, as visualized in Figure 5b.In Figure 8 we see that as network depth increases association fields visually merge in the feature maps of PDE-G-CNNs towards adaptive line detectors, whereas such merging/grouping of association fields is not visible in normal CNNs.
In all cases, the PDE-G-CNNs still outperform G-CNNs and CNNs on the DCA1 dataset and Lines dataset: they have a higher (or equal) performance, while having a huge reduction in network complexity, even when using 3 layers.Regardless, the choice of kernel ρ c , ρ b , ρ b,sr , ρ b,com the advantage of PDE-G-CNNs towards G-CNNs and CNNs is significant, as can be clearly observed in Figures 16 and 19 and Tables 7 and 8.This is in line with previous observations on other datasets [28].
Altogether, PDE-G-CNNs have better geometric reduction, performance, and geometric interpretation, than basic classical feed-forward (G)-CNN networks on various segmentation problems.
Extensive investigations on training data reduction, memory reduction (via U-Net versions of PDE-G-CNNs), and a topological description of the merging of association fields are beyond the scope of this article, and are left for future work.Finally the lemma follows by algebraic manipulations and the fact that w 1 ≤ w 2 .

Appendix B Geometric Interpretation of PDE-G-CNN layers
In a PDE-G-CNN layer [28,60] one first performs convection and then a morphological convolution (dilation/erosion).This has the interesting effect that we can interpret this equivalently as performing a morphological convolution with a shifted morphological kernel.To make this precise we first define what convection exactly is: The solution of this left-invariant transport ('convection') is quite simple and we state it in the following proposition without proof: where we identified v as a tangent vector in TeSE(2).
For the proof, and more details on how convection is implemented in practice within the PDE-G-CNN framework, we refer to [28,Sec.5.1].The general idea is that the characteristics of leftinvariant flow are Lie group exponential curves acting on the reference point p 0 ∈ M 2 in the homogeneous space.
We can now show that first performing convection and then a morphological convolution is the same as doing a morphological convolution with a shifted kernel: Recall the relation between (approximative) Riemannian balls and association fields, as visualised in Figures 4, 5 and 9.
The top left corner in Figure B1 shows how a single PDE-G-CNN module (i.e.operator between two nodes in the network).The top-right shows the geometric rationale behind a PDE-G-CNNs that essentially performs perceptual grouping of association fields via training, and indeed the bottom two rows of Figure B1 reveals how the grouping of association fields becomes visible in the feature maps of two input test images.In comparison this (for PDE-G-CNNs) typical geometric behavior is absent in feature maps of CNNs applied to the same images, recall Figure 8. 2) an association field for excitation (green), and 3) an association field for inhibition (red).Bottom: As network-depth increases these association fields group together as visible in the feature maps.

Fig. 3 :
Fig.3: An example of an image together with its orientation score.We can see that the image, a real-valued function on R 2 , is lifted to an orientation score, a real-valued function on M 2 .Notice that the lines that are crossing in the left image are disentangled in the orientation score.

Fig. 7 :Fig. 8 :
Fig. 7: The overall architecture for a PDE-G-CNN performing line completion on the Lines data set.Note how the input image is lifted to an orientation score that lives in the higher dimensional space M 2 , run through PDE-G-CNN layers(Figures1 and 2), and afterwards projected down back to R 2 .Usually this projection is done by taking the maximum value of a feature map over the orientations θ, for every position (x, y) ∈ R 2 .

4 Fig. 15 :
Fig. 15: Same setting as Figure 14 but at x = y, θ = 0.The horizontal axis moves along the line x = y.

Fig. 16 :
Fig. 16: A scatterplot showing how a 3, 6, and 12 layer CNN, G-CNN, and PDE-G-CNN compare on the DCA1 dataset.The crosses indicate the mean.We see the PDE-G-CNNs provide equal or better results with respectively 2, 10 and 35 times less parameters, see Table7.

Fig. 17 :
Fig. 17: A scatterplot showing how the use of different distance approximations affect the performance of the 6 layer PDE-G-CNN on the DCA1 dataset.The crosses indicate the mean.

Fig. 18 :
Fig. 18: In 18a and 18d we see one sample from the DCA1 dataset: a coronary angiogram together with the ground-truth segmentation.The other four pictures show the output of the 6 layer PDE-G-CNN, one for each distance approximation.The networks that were used in this figure have an accuracy approximately equal to the mean accuracy in Figure 17.

Fig. 19 :
Fig. 19: A scatterplot showing how a 6 layer CNN, G-CNN (both with ≈ 25k parameters), and a PDE-G-CNN (with only 6k parameters) compare on the Lines dataset.The crosses indicate the mean.For the precise amount of parameters see Table8.

Fig. 20 :
Fig. 20: A scatterplot showing how the use of different distance approximations affect the performance of the 6 layer PDE-G-CNN on the Lines dataset.The crosses indicate the mean.

Fig. 21 :
Fig. 21: In 21a and 21d we see one sample from the Lines dataset.The other four pictures are visualizations of feature maps of the 6 layer PDE-G-CNN.In 21b and 21e we see a feature map of the lifting layer together with its max-projection over θ.In 21c and 21f we see a feature map of the last PDE layer, just before the final projection layer.

Definition 2 (
Convection) Let v ∈ Tp 0 (M 2 ) be a tangent vector at the reference point p 0 , and let c : M 2 → T (M 2 ) be the corresponding left-invariant vector field obtained by pushing v forward with the left-action Lg(p) := gp, i.e. c(gp 0 ) = (Lg) * v. Convection is defined as:∂W ∂t = −cW W | t=0 = U,where both W and U are scalar differentiable functions on M 2 .

Fig. B1 :
Fig.B1: A PDE-G-CNN module trains leftinvariant convection vector field c, a Riemannian ball over which we apply max-pooling (dilation for line excitation), and a Riemannian ball over which we apply min-pooling (erosion for inhibition/sharpening).Top: by Proposition 3 a PDE-G-CNN module trains: 1) a center point (blue), 2) an association field for excitation (green), and 3) an association field for inhibition (red).Bottom: As network-depth increases these association fields group together as visible in the feature maps.

Table 3 :
Same situation and metric parameters as Table2, i.e. w 1 = w 3 = 1 and w 2 = 8.We see the exact distance d alongside the old sub-Riemannian approximation ρ b,sr,old

Table 4 :
Same as Table

Table 6 :
Numerical mean relative error ε between ρ b and d for multiple spatial anisotropies ζ.

Table 7 :
The total amount of parameters in the networks that are used in Figure16.

Table 8 :
The total amount of parameters in the networks that are used in Figure19.