1 Introduction

Many classification, segmentation, and tracking tasks in computer vision and digital image processing require some form of “symmetry.” Think, for example, of image classification. If one rotates, reflects, or translates an image, the classification stays the same. We say that an ideal image classification is invariant under these symmetries. A slightly different situation is image segmentation. In this case, if the input image is in some way changed the output should change accordingly. Therefore, an ideal image segmentation is equivariant with respect to these symmetries.

Many computer vision and image processing problems are currently being tackled with neural networks (NNs). It is desirable to design neural networks in such a way that they respect the symmetries of the problem, i.e., make them invariant or equivariant. Think for example of a neural network that detects cancer cells. It would be disastrous if, by for example slightly translating an image, the neural network would give totally different diagnosis, even though the input is essentially the same.

Fig. 1
figure 1

The difference between a traditional CNN layer and a PDE-G-CNN layer. In contrast to traditional CNNs, the layers in a PDE-G-CNN do not depend on ad-hoc nonlinearities like ReLU’s, and are instead implemented as solvers of (non)linear PDEs. What the PDE evolution block consists of can be seen in Fig. 2

One way to make the networks equivariant or invariant is to simply train them on more data. One could take the training dataset and augment it with translated, rotated, and reflected versions of the original images. This approach however is undesirable: invariance or equivariance is still not guaranteed and the training takes longer. It would be better if the networks are inherently invariant or equivariant by design. This avoids a waste of network-capacity, guarantees invariance or equivariance, and increases performances, see for example [1].

More specifically, many computer vision and image processing problems are tackled with convolutional neural networks (CNNs) [2,3,4]. Convolution neural networks have the property that they inherently respect, to some degree, translation symmetries. CNNs do not however take into account rotational or reflection symmetries. Cohen and Welling introduced group equivariant convolutional neural networks (G-CNNs) in [5] and designed a classification network that is inherently invariant under 90 degree rotations, integer translations, and vertical/horizontal reflections. Much work is being done on invariant/equivariant networks that exploit inherent symmetries, a non-exhaustive list is [1, 6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26]. The idea of including geometric priors, such as symmetries, into the design of neural networks is called ‘Geometric Deep Learning’ in [27].

In [28], partial differential equation (PDE)-based G-CNNs are presented, aptly called PDE-G-CNNs. In fact, G-CNNs are shown to be a special case of PDE-G-CNNs (if one restricts the PDE-G-CNNs only to convection, using many transport vectors [28, Sec. 6]). With PDE-G-CNNs, the usual nonlinearities that are present in current networks, such as the ReLU activation function and max-pooling, are replaced by solvers for specifically chosen nonlinear evolution PDEs. Figure 1 illustrates the difference between a traditional CNN layer and a PDE-G-CNN layer.

The PDEs that are used in PDE-G-CNNs are not chosen arbitrarily: they come directly from the world of geometric image analysis, and thus their effects are geometrically interpretable. This makes PDE-G-CNNs more geometrically meaningful and interpretable than traditional CNNs. Specifically, the PDEs considered are diffusion, convection, dilation, and erosion. These 4 PDEs correspond to the common notions of smoothing, shifting, max pooling, and min pooling. They are solved by linear convolutions, resamplings, and so-called morphological convolutions. Figure 2 illustrates the basic building block of a PDE-G-CNN.

Fig. 2
figure 2

Overview of a PDE evolution block. Convection is solved by resampling, diffusion is solved by a linear group convolution with a certain kernel [28, Sec. 5.2], and dilation and erosion are solved by morphological group convolutions (3) with a morphological kernel (1)

One shared property of G-CNNs and PDE-G-CNNs is that the input data usually needs to be lifted to a higher dimensional space. Take, for example, the case of image segmentation with a convolution neural network where we model/idealize the images as real-valued function on \(\mathbb {R}^2\). If we keep the data as functions on \(\mathbb {R}^2\) and want the convolutions within the network to be equivariant, then the only possible ones that are allowed are with isotropic kernels, [29, p. 258]. This type of short-coming generalizes to other symmetry groups as well [12, Thm. 1]. One can imagine that this is a constraint too restrictive to work with, and that is why we lift the image data.

Within the PDE-G-CNN framework, the input images are considered real-valued functions on \(\mathbb {R}^d\), the desired symmetries are represented by the Lie group of roto-translations SE(d), and the data is lifted to the homogeneous space of d dimensional positions and orientations \(\mathbb {M}_d\). It is on this higher dimensional space on which the evolution PDEs are defined, and the effects of diffusion, dilation, and erosion are completely determined by the Riemannian metric tensor field \(\mathcal {G}\) that is chosen on \(\mathbb {M}_d\). If this Riemannian metric tensor field \(\mathcal {G}\) is left-invariant, the overall processing is equivariant, this follows by combining techniques in [30, Thm.  21, Chpt. 4], [31, Lem.  3, Thm. 4].

The Riemannian metric tensor field \(\mathcal {G}\) we will use in this article is left-invariant and determined by three nonnegative parameters: \(w_1\), \(w_2\), and \(w_3\). The definition can be found in the preliminaries, Sect. 2 Equation (4). It is exactly these three parameters that during the training of a PDE-G-CNN are optimized. Intuitively, the parameters correspondingly regulate the cost of main spatial, lateral spatial, and angular motion. An important quantity in the analysis of this paper is the spatial anisotropy \(\zeta := \frac{w_1}{w_2}\), as will become clear later.

In this article, we only consider the two-dimensional case, i.e., \(d=2\). In this case, the elements of both \(\mathbb {M}_2\) and SE(2) can be represented by three real numbers: \((x,y,\theta ) \in \mathbb {R}^2 \times [0,2\pi )\). In the case of \(\mathbb {M}_2\), the x and y represent a position and \(\theta \) represents an orientation. Throughout the article, we take \(\textbf{p}_0:= (0,0,0) \in \mathbb {M}_2\) as our reference point in \(\mathbb {M}_2\). In the case of SE(2), we have that x and y represent a translation and \(\theta \) a rotation.

As already stated, within the PDE-G-CNN framework images are lifted to the higher dimensional space of positions and orientations \(\mathbb {M}_d\). There are a multitude of ways of achieving this, but there is one very natural way to do it: the orientation score transform [30, 32,33,34]. In this transform, we pick a point \((x,y) \in \mathbb {R}^2\) in an image and determine how good a certain orientation \(\theta \in [0, 2\pi )\) fits the chosen point. In Fig. 3 an example of an orientation score is given. We refer to [34, Sec. 2.1] for a summary of how an orientation score transform works.

Fig. 3
figure 3

An example of an image together with its orientation score. We can see that the image, a real-valued function on \(\mathbb {R}^2\), is lifted to an orientation score, a real-valued function on \(\mathbb {M}_2\). Notice that the lines that are crossing in the left image are disentangled in the orientation score

Inspiration for using orientation scores comes from biology. The Nobel laureates Hubel and Wiesel found that many cells in the visual cortex of cats have a preferred orientation [35, 36]. Moreover, a neuron that fires for a specific orientation excites neighboring neurons that have an “aligned” orientation. Petitot and Citti-Sarti proposed a model [37, 38] for the distribution of the orientation preference and this excitation of neighbors based on sub-Riemannian geometry on \(\mathbb {M}_2\). They relate the phenomenon of preference of aligned orientations to the concept of association fields [39], which model how a specific local orientation places expectations on surrounding orientations in human vision. Figure 4 provides an impression of such an association field.

Fig. 4
figure 4

Association field lines from neurogeometry [37, Fig. 43], [39, Fig. 16]. Such association field lines can be well approximated by spatially projected sub-Riemannian geodesics in \(\mathbb {M}_2\) [37, 38, 40, 41, 42, Fig. 17]

As shown in [42, Fig. 17], association fields are closely approximated by (projected) sub-Riemannian geodesics in \(\mathbb {M}_2\) for which optimal synthesis has been obtained by Sachkov and Moiseev [43, 44]. Furthermore, in [45] it is shown that the Riemannian geodesics in \(\mathbb {M}_2\) converge to the sub-Riemannian geodesics by increasing the spatial anisotropy \(\zeta \) of the metric. This shows that in practice one can approximate the sub-Riemannian model by Riemannian models. Figure 5 shows the relation between association fields and sub-Riemannian geometry in \(\mathbb {M}_2\).

Fig. 5
figure 5

A visualization of the exact Riemannian distance d, and its relation with association fields. In Fig. 5a, we see isocontours of \(d(\textbf{p}_0, \cdot )\) in \(\mathbb {M}_2\), and on the bottom we see the min-projection over \(\theta \) of these contours (thus we selected the minimal ending angle in contrast to Fig. 4). The domain of the plot is \([-3,3]^2\times [-\pi ,\pi ) \subset \mathbb {M}_2\). The chosen contours are \(d = 0.5, 1, 1.5, 2\), and 2.5. The metric parameters are \((w_1,w_2,w_3)=(1,64,1)\). Due to the very high spatial anisotropy, we approach the sub-Riemannian setting. In Fig. 5b, we see the same min-projection together with some corresponding spatially projected geodesics

Fig. 6
figure 6

One sample of the Lines dataset. In Fig. 6a, we see the input, in Fig. 6b the perceived curve that we consider as ground-truth (as the input is constructed by interrupting the ground-truth line and adding random local orientations)

Fig. 7
figure 7

The overall architecture for a PDE-G-CNN performing line completion on the Lines data set. Note how the input image is lifted to an orientation score that lives in the higher dimensional space \(\mathbb {M}_2\), run through PDE-G-CNN layers (Figs. 1 and 2), and afterwards projected down back to \(\mathbb {R}^2\). Usually this projection is done by taking the maximum value of a feature map over the orientations \(\theta \), for every position \((x,y) \in \mathbb {R}^2\)

Fig. 8
figure 8

Visualization of how a PDE-G-CNN and CNN incrementally complete a line throughout their layers. The first two rows are of a PDE-G-CNN, the second two rows of a CNN. The first column is the input, the last column the output. The intermediate columns are a representative selection of feature maps from the output of the respective CNN or PDE layer (Fig. 1). The feature maps of the PDE-G-CNN live in \(\mathbb {M}_2\), but for clarity we only show the max-projection over \(\theta \). Within the feature maps of the PDE-G-CNN association fields from neurogeometry [37, 39, 46] become visible as network depth increases. Such merging of association fields is not visible in the feature maps of the CNN. This observation is consistent throughout different inputs

The relation between association fields and Riemannian geometry on \(\mathbb {M}_2\) directly extends to a relation between dilation/erosion and association fields. Namely, performing dilation on an orientation score in \(\mathbb {M}_2\) is similar to extending a line segment along its association field lines. Similarly, performing erosion is similar to sharpening a line segment perpendicular to its association field lines. This makes dilation/erosion the perfect candidate for a task such as line completion.

In the line completion problem, the input is an image containing multiple line segments, and the desired output is an image of the line that is “hidden” in the input image. Figure 6 shows such an input and desired output. This is also what David Field et al. studied in [39]. We anticipate that PDE-G-CNNs outperform classical CNNs in the line completion problem due to PDE-G-CNNs being able to dilate and erode. To investigate this, we made a synthetic dataset called “Lines” consisting of grayscale \(64\times 64\) pixel images, together with their ground-truth line completion. In Fig. 7, a complete abstract overview of the architecture of a PDE-G-CNN performing line completion is visualized. Figure 8 illustrates how a PDE-G-CNN and CNN incrementally complete a line throughout their layers.

In Proposition 1, we show that solving the dilation and erosion PDEs can be done by performing a morphological convolution with a morphological kernel \(k_t^{\alpha }: \mathbb {M}_2 \rightarrow \mathbb {R}_{\ge 0}\), which is easily expressed in the Riemannian distance \(d=d_{\mathcal {G}}\) on the manifold:

$$\begin{aligned} k_t^{\alpha }(\textbf{p})=\frac{t}{\beta } \left( \frac{d_{\mathcal {G}}(\textbf{p}_0,\textbf{p})}{t}\right) ^{\beta }. \end{aligned}$$
(1)

Here \(\textbf{p}_0 = (0,0,0)\) is our reference point in \(\mathbb {M}_2\), and time \(t>0\) controls the amount of erosion and dilation. Furthermore, \(\alpha >1\) controls the “softness” of the max and min-pooling, with \(\frac{1}{\alpha }+\frac{1}{\beta }=1\). Erosion is done through a direct morphological convolution (3) with this specific kernel. Dilation is solved in a slightly different way but again with the same kernel (Proposition 1 in Sect. 3 will explain the details).

And this is where a problem arises: calculating the exact distance d on \(\mathbb {M}_2\) required in (1) is computationally expensive [47]. To alleviate this issue, we resort to estimating the true distance d with computationally efficient approximative distances, denoted throughout the article by \(\rho \). We then use such a distance approximation within (1) to create a corresponding approximative morphological kernel, and in turn use this to efficiently calculate the effect of dilation and erosion.

In [28], one such distance approximation is used: the logarithmic distance estimate \(\rho _c\) which uses the logarithmic coordinates \(c^i\) (8). In short, \(\rho _c(\textbf{p})\) is equal to the Riemannian length of the exponential curve that connects \(\textbf{p}_0\) to \(\textbf{p}\). The formal definition will follow in Sect. 4. In Fig. 9 an impression of \(\rho _c\) is given.

Clearly, an error is made when the effect of erosion and dilation is calculated with an approximative morphological kernel. As a morphological kernel is completely determined by its corresponding (approximative) distance, it follows that one can analyze the error by analyzing the difference between the exact distance d and approximative distance \(\rho \) that is used.

Despite showing in [28] that \(d \le \rho _c\) no concrete bounds are given, apart from the asymptotic \( \rho _c^2 \le d^2 + \mathcal {O}(d^4) \). This motivates us to do a more in-depth analysis on the quality of the distance approximations.

We introduce a variation on the logarithmic estimate \(\rho _c\) called the half-angle distance estimate \(\rho _b\), and analyze that. The half-angle approximation uses not the logarithmic coordinates but half-angle coordinates \(b^i\). The definition of these is also given later (28). In practice, \(\rho _c\) and \(\rho _b\) do not differ much, but analyzing \(\rho _b\) is much easier!

The main theorem of the paper, Proposition 1, collects new theoretical results that describe the quality of using the half-angle distance approximation \(\rho _b\) for solving dilation and erosion in practice. It relates the approximative morphological kernel \(k_b\) corresponding with \(\rho _b\), to the exact kernel k (1).

Both the logarithmic estimate \(\rho _c\) and half-angle estimate \(\rho _b\) approximate the true Riemannian distance d quite well in certain cases. One of these cases is when the Riemannian metric has a low spatial anisotropy \(\zeta \). We can show this visually by comparing the isocontours of the exact and approximative distances. However, interpreting and comparing these surfaces can be difficult. This is why we have decided to additionally plot multiple \(\theta \)-isocontours of these surfaces. In Fig. 10 one such plot can be seen and illustrates how it must be interpreted.

Fig. 9
figure 9

A visualization of \(\rho _c\), similar to Fig. 5. In Fig. 9a, we see multiple contours of \(\rho _c\), and on the bottom we see the min-projection over \(\theta \). The metric parameters are \((w_1,w_2,w_3)=(1,4,1)\). In Fig. 9b, we see the same min-projection together with some corresponding spatially projected exponential curves. Note the similarity to Fig. 4

Fig. 10
figure 10

In grey, the isocontour \(d=2.5\) is plotted. The metric parameters are \((w_1,w_2,w_3)=(1,8,1)\). For \(\theta = k\pi /10\) with \( k = -10,\dots ,10 \), the isocontours are drawn and projected onto the bottom of the figure. The same kind of visualizations is used in Tables 1 and 2

Table 1 The balls of the exact distance d and approximative distance \(\rho _b\) in the isotropic and low anisotropic case. The radius of the balls is set to \(r = 2.5\). The domain of the plots is \([-3,3]\times [-3,3]\times [-\pi ,\pi )\). We fix \(w_1=w_3=1\) throughout the plots and vary \(w_2\). For \(\theta = k\pi /10\) with \( k = -10,\dots ,10 \) the isocontours are drawn, similar to Fig. 10
Table 2 The same as Table 1 but in the high spatially anisotropic case. Alongside the approximation \(\rho _b\) the sub-Riemannian distance approximation \(\rho _{b,sr}\) is plotted with \(\nu = 1.6\). We see that the isocontours of \(\rho _b\) are too “thin” compared to the isocontours of d. The isocontours of \(\rho _{b,sr}\) are better in this respect

In Table 1, a spatially isotropic \(\zeta = 1\) and low-anisotropic case \(\zeta = 2\) is visualized. Note that \(\rho _b\) approximates d well in these cases. In fact, \(\rho _b\) is exactly equal to the true distance d in the spatially isotropic case, which is not true for \(\rho _c\).

Both the logarithm and half-angle approximation fail specifically in the high spatial anisotropy regime. For example when \(\zeta = 8\). The first two columns of Table 2 show that, indeed, \(\rho _b\) is no longer a good approximation of the exact distance d. For this reason, we introduce a novel sub-Riemannian distance approximations \(\rho _{b, sr}\), which is visualized in the third column of Table 2.

Finally, we propose an approximative distance \(\rho _{com}\) that carefully combines the Riemannian and sub-Riemannian approximations into one. This combined approximation automatically switches to the estimate that is more appropriate depending on the spatial anisotropy, and hence covers both the low and high anisotropy regimes. Using the corresponding morphological kernel of \(\rho _{com}\) to solve erosion and dilation, we obtain more accurate (and still tangible) solutions of the nonlinear parts in the PDE-G-CNNs.

For every distance approximation (listed in Sect. 4), we perform an empirical analysis in Sect. 6 by seeing how the estimate changes the performance of the PDE-G-CNNs when applied to two datasets: the Lines dataset and publicly available DCA1 dataset.

1.1 Contributions

In Proposition 1, we summarize how the nonlinear units in PDE-G-CNNs (described by morphological PDEs) are solved using morphological kernels and convolutions, which provides sufficient and essential background for the discussions and results in this paper.

The key contributions of this article are:

  • Proposition 1 summarizes our mathematical analysis of the quality of the half-angle distance approximation \(\rho _b\) and its corresponding morphological kernel \(k_b\) in PDE-G-CNNs. We do this by comparing \(k_b\) to the exact morphological kernel k. Globally, one can show that they both carry the same symmetries, and that for low spatial anisotropies \(\zeta \) they are almost indistinguishable. Furthermore, we show that locally both kernels are similar through an upper bound on the relative error. This improves upon results in [28, Lem. 20].

  • Table 2 demonstrates qualitatively that \(\rho _b\) becomes a poor approximation when the spatial anisotropy is high \(\zeta \gg 1\). In Corollary 4, we underpin this theoretically and in Sect. 6.1 we validate this observation numerically. This motivates the use of a sub-Riemannian approximation when \(\zeta \) is large.

  • In Sect. 4, we introduce and derive a novel sub-Riemannian distance approximation \(\rho _{sr}\), that overcomes difficulties in previous existing sub-Riemannian kernel approximations [48]. Subsequently, we propose our approximation \(\rho _{com}\) that combines the Riemannian and sub-Riemannian approximations into one that automatically switches to the approximation that is more appropriate depending on the metric parameters.

  • Figures 16 and 19 show that PDE-G-CNNs perform just as well as, and sometimes better than, G-CNNs and CNNs on the DCA1 and Lines dataset, while having the least amount of parameters. Figures 20 and 17 depict an evaluation of the performance of PDE-G-CNNs when using the different distance approximations, again on the DCA1 and Lines dataset. We observe that the new kernel \(\rho _{b,com}\) provides best results.

Our theoretical contributions are also relevant outside the context of geometric deep learning. Namely, it also applies to general geometric image processing [48], neurogeometry [37, 38], and robotics [49, Sec. 6.8.4].

In addition, Figs. 4, 5, 9 and 8 show a connection between the PDE-G-CNN framework with the theory of association fields from neurogeometry [37, 39]. Thereby, PDE-G-CNNs reveal improved geometrical interpretability, in comparison with existing convolution neural networks. In Appendix 1, we further clarify the geometrical interpretability.

1.2 Outline

In Sect. 2, a short overview of the necessary mathematical preliminaries is given. Section 3 collects some known results on the exact solution of erosion and dilation on the homogeneous space of two-dimensional positions and orientations \(\mathbb {M}_2\), and motivates the use of morphological kernels. In Sect. 4, all approximative distances are listed. The approximative distances give rise to corresponding approximative morphological kernels. The main theorem of this paper can be found in Sect. 5 and consist of three parts, of which the proofs can be found in the relevant subsections. The main theorem mostly concerns itself with the analysis of the approximative morphological kernel \(k_b\). Experiments with various approximative kernels are done and the result can be found in Sect. 6. Finally, we end the paper with a conclusion in Sect. 7.

2 Preliminaries

Coordinates on SE(2) and \(\mathbb {M}_2\). Let \(G = SE(2) = \mathbb {R}^2 \rtimes SO(2)\) be the two-dimensional rigid body motion group. We identify elements \(g \in G\) with \(g \equiv (x,y,\theta ) \in \mathbb {R}^2 \times \mathbb {R}/(2\pi \mathbb {Z})\), via the isomorphism \(SO(2) \cong \mathbb {R}/(2\pi \mathbb {Z})\). Furthermore, we always use the small-angle identification \( \mathbb {R}/(2\pi \mathbb {Z}) = [-\pi , \pi )\).

For \(g_1=(x_1, y_1, \theta _1)\), \(g_2 = (x_2, y_2, \theta _2) \in SE(2)\) we have the group product

$$\begin{aligned} \begin{aligned} g_1 g_2:= (&x_1 + x_2 \cos \theta _1 - y_2 \sin \theta _1, \\&y_1 + x_2 \sin \theta _1 + y_2 \cos \theta _1, \\&\theta _1 + \theta _2 {{\,\textrm{mod}\,}}2\pi ), \end{aligned} \end{aligned}$$

and the identity is \(e = (0,0,0)\). The rigid body motion group acts on the homogeneous space of two-dimensional positions and orientations \(\mathbb {M}_{2} = \mathbb {R}^2 \times S^1 \subseteq \mathbb {R}^2 \times \mathbb {R}^2\) by the left-action \(\odot \):

$$\begin{aligned} (\textbf{x},\textbf{R}) \odot (\textbf{y},\textbf{n})= (\textbf{x}+ \textbf{R}\textbf{y},\textbf{R}\textbf{n}), \end{aligned}$$

with \((\textbf{x},\textbf{R}) \in SE(2)\) and \((\textbf{y},\textbf{n}) \in \mathbb {M}_2\). If context allows it, we may omit writing \(\odot \) for conciseness. By choosing the reference element \(\textbf{p}_0 = (0,0,(1,0)) \in \mathbb {M}_2\), we have:

$$\begin{aligned} (x,y,\theta ) \odot \textbf{p}_0 = (x,y,(\cos \theta , \sin \theta )). \end{aligned}$$
(2)

This mapping is a diffeomorphism and allows us to identify SE(2) and \(\mathbb {M}_2\). Thereby we will also freely use the \((x,y,\theta )\) coordinates on \(\mathbb {M}_2\).

Morphological group convolution. Given functions \(f_1,f_2:\mathbb {M}_2 \rightarrow \mathbb {R}\), we define their morphological convolution (or ‘infimal convolution’) [50, 51] by

$$\begin{aligned} (f_1 \mathbin {\square } f_2)(\textbf{p})= \inf \limits _{g \in G} \left\{ f_1(g^{-1} \textbf{p}) + f_2(g \, \textbf{p}_0)\right\} \end{aligned}$$
(3)

Left-invariant (co-)vector fields on \(\mathbb {M}_2\). Throughout this paper, we shall rely on the following basis of left-invariant vector fields:

$$\begin{aligned} \begin{aligned} \mathcal {A}_{1}&= \cos \theta \partial _x + \sin \theta \partial _y, \\ \mathcal {A}_{2}&= -\sin \theta \partial _x + \cos \theta \partial _y, \text { and }\\ \mathcal {A}_{3}&= \partial _{\theta }. \end{aligned} \end{aligned}$$

The dual frame \(\omega ^i\) is given by \(\langle \omega ^i, \mathcal {A}_{j}\rangle =\delta ^{i}_j\), i.e.,

$$\begin{aligned} \begin{aligned} \omega ^1&= \cos \theta \textrm{d}x + \sin \theta \textrm{d}y, \\ \omega ^2&= -\sin \theta \textrm{d}x +\cos \theta \textrm{d}y, \text { and } \\ \omega ^3&= \textrm{d}\theta . \end{aligned} \end{aligned}$$

Metric tensor fields on \(\mathbb {M}_2\). We consider the following left-invariant metric tensor fields:

$$\begin{aligned} \begin{aligned} \mathcal {G}= \sum _{i=1}^{3} w_i^2 \ \omega ^{i} \otimes \omega ^i \end{aligned} \end{aligned}$$
(4)

and write \(\Vert {\dot{\textbf{p}}}\Vert =\sqrt{\mathcal {G}_{\textbf{p}}({\dot{\textbf{p}}},{\dot{\textbf{p}}})}\). Here, \(w_i > 0\) are the metric parameters. We also use the dual norm \(\Vert {\hat{\textbf{p}}}\Vert _* = \sup \limits _{{{\dot{\textbf{p}}}} \in T_\textbf{p}\mathbb {M}_2} \frac{\left\langle {{\dot{\textbf{p}}}}, {\hat{\textbf{p}}} \right\rangle }{\Vert {{\dot{\textbf{p}}}}\Vert }\). We will assume, without loss of generality, that \(w_2 \ge w_1\) and introduce the ratio

$$\begin{aligned} \zeta := \frac{w_2}{w_1} \ge 1 \end{aligned}$$
(5)

that is called the spatial anisotropy of the metric. Distances on \(\mathbb {M}_2\). The left-invariant metric tensor field \(\mathcal {G}\) on \(\mathbb {M}_2\) induces a left-invariant distance (‘Riemannian metric’) \(d:\mathbb {M}_{2} \times \mathbb {M}_2 \rightarrow \mathbb {R}_{\ge 0}\) by

$$\begin{aligned} d_{\mathcal {G}}(\textbf{p},\textbf{q})= \inf _{\gamma \in \Gamma _t(\textbf{p},\textbf{q})}\left( L_{\mathcal {G}}(\gamma ):= \int _0^t \Vert {\dot{\gamma }}(s)\Vert _{\mathcal {G}}\, \textrm{d}s \right) , \end{aligned}$$
(6)

where \(\Gamma _t(\textbf{p}, \textbf{q})\) is the set piecewise \(C^1\)-curves \(\gamma \) in \(\mathbb {M}_2\) with \(\gamma (0)=\textbf{p}\) and \(\gamma (t)=\textbf{q}\). The right-hand side does not depend on \(t>0\), and we may set \(t=1\).

If no confusion can arise, we omit the subscript \(\mathcal {G}\) and write \(d, L, \Vert \cdot \Vert \) for short. The distance being left-invariant means that for all \(g\in SE(2)\), \(\textbf{p}_1,\textbf{p}_2 \in \mathbb {M}_2\) one has \(d(\textbf{p},\textbf{q})=d(g \textbf{p},g \textbf{q})\). We will often use the shorthand notation \(d(\textbf{p}):=d(\textbf{p}, \textbf{p}_0)\).

We often consider the sub-Riemannian case arising when \(w_2 \rightarrow \infty \). Then we have “infinite cost” for sideways motion and the only “permissible” curves \(\gamma \) are the ones for which \({{\dot{\gamma }}}(t) \in H\) where \(H:= \text {span}\{\mathcal {A}_1, \mathcal {A}_3\} \subset T\mathbb {M}_{2}\). This gives rise to a new notion of distance, namely the sub-Riemannian distance \(d_{sr}\):

$$\begin{aligned} d_{sr}(\textbf{p},\textbf{q})= \inf _{\begin{array}{c} \gamma \in \Gamma _t(\textbf{p},\textbf{q}), \\ {{\dot{\gamma }}} \in H \end{array}} L_{\mathcal {G}}(\gamma ). \end{aligned}$$
(7)

One can show rigorously that when \(w_2 \rightarrow \infty \) the Riemannian distance d tends to the sub-Riemannian distance \(d_{sr}\), see for example [45, Thm. 2].

Exponential and Logarithm on SE(2). The exponential map \(\exp (c^1 \partial _x \vert _e + c^2 \partial _y \vert _e + c^3 \partial _\theta \vert _e) = (x,y,\theta ) \in SE(2)\) is given by:

$$\begin{aligned} \begin{aligned} x&= \left( c^1 \cos \tfrac{c^3}{2} - c^2 \sin \tfrac{c^3}{2} \right) {{\,\textrm{sinc}\,}}\tfrac{c^3}{2}, \\ y&= \left( c^1 \sin \tfrac{c^3}{2} + c^2 \cos \tfrac{c^3}{2} \right) {{\,\textrm{sinc}\,}}\tfrac{c^3}{2}, \\ \theta&= c^3 {{\,\textrm{mod}\,}}2\pi . \end{aligned} \end{aligned}$$

And the logarithm: \(\log (x,y,\theta ) = c^1 \partial _x\vert _e + c^2 \partial _y\vert _e + c^3 \partial _\theta \vert _e \in T_eSE(2)\):

$$\begin{aligned} \begin{aligned} c^1&= \frac{x\cos \tfrac{\theta }{2} + y \sin \tfrac{\theta }{2}}{{{\,\textrm{sinc}\,}}\tfrac{\theta }{2}}, \\ c^2&= \frac{-x \sin \tfrac{\theta }{2} + y \cos \tfrac{\theta }{2}}{{{\,\textrm{sinc}\,}}\tfrac{\theta }{2}}, \\ c^3&= \theta . \end{aligned} \end{aligned}$$
(8)

By virtue of equation (2), we will freely use the logarithm coordinates on \(\mathbb {M}_2\).

3 Erosion and Dilation

We will be considering the following Hamilton–Jacobi equation on \(\mathbb {M}_2\):

$$\begin{aligned} {\left\{ \begin{array}{ll} \frac{\partial W_\alpha }{\partial t} &{}= \pm \frac{1}{\alpha } \left\| \nabla W_{\alpha } \right\| ^\alpha = \pm \mathcal {H}_{\alpha }(dW_\alpha ) \\ \left. W_\alpha \right| _{t=0} &{}= U, \end{array}\right. } \end{aligned}$$
(9)

with the Hamiltonian \(\mathcal {H}_\alpha : T^*\mathbb {M}_2 \rightarrow \mathbb {R}_{\ge 0}\):

$$\begin{aligned} \mathcal {H}_{\alpha }(\hat{\textbf{p}}) = \mathcal {H}_{\alpha }^{1D}(\Vert \hat{\textbf{p}}\Vert ) = \frac{1}{\alpha }\Vert \hat{\textbf{p}}\Vert _*^{\alpha }, \end{aligned}$$

and where \(W_\alpha \) the viscosity solutions [52] obtained from the initial condition \(U \in C( \mathbb {M}_{2},\mathbb {R})\). Here the \(+\)sign is a dilation scale space and the −sign is an erosion scale space [50, 51]. If confusion cannot arise, we omit the superscript 1D. Erosion and dilation correspond to min- and max-pooling, respectively. The Lagrangian \(\mathcal {L}_\alpha : T\mathbb {M}_2 \rightarrow \mathbb {R}_{\ge 0}\) corresponding with this Hamiltonian is obtained by taking the Fenchel transform of the Hamiltonian:

$$\begin{aligned} \mathcal {L}_{\alpha }({\dot{\textbf{p}}}) = \mathcal {L}^{1D}_{\alpha }(\Vert {\dot{\textbf{p}}}\Vert ) =\frac{1}{\beta } \Vert {\dot{\textbf{p}}}\Vert ^\beta \end{aligned}$$

with \(\beta \) such that \(\frac{1}{\alpha } + \frac{1}{\beta } = 1\). Again, if confusion cannot arise, we omit the subscript \(\alpha \) and/or superscript 1D. We deviate from our previous work by including the factor \(\frac{1}{\alpha }\) and working with a power of \(\alpha \) instead of \(2\alpha \). We do this because it simplifies the relation between the Hamiltonian and Lagrangian.

The following proposition collects standard results in terms of the solutions of Hamilton–Jacobi equations on manifolds [53,54,55], thereby generalizing results on \(\mathbb {R}^2\) to \(\mathbb {M}_2\).

Proposition 1

(Solution erosion & dilation) Let \(\alpha > 1\). The viscosity solution \(W_\alpha \) of the erosion PDE (9) is given by

$$\begin{aligned} W_\alpha (\textbf{p},t)&= \inf _{\begin{array}{c} \textbf{q}\in \mathbb {M}_2, \\ \gamma \in \Gamma _t(\textbf{p}, \textbf{q}) \end{array}} U(\textbf{q}) + \int \limits _0^{t} \mathcal {L}_{\alpha }({\dot{\gamma }}(s))\, \textrm{d}s \end{aligned}$$
(10)
$$\begin{aligned}&= \inf _{\textbf{q}\in \mathbb {M}_2} U(\textbf{q}) + t \mathcal {L}^{1D}_\alpha (d(\textbf{p}, \textbf{q})/t) \end{aligned}$$
(11)
$$\begin{aligned}&=(k_t^{\alpha } \mathbin {\square } U)(\textbf{p}) \end{aligned}$$
(12)

where the morphological kernel \(k_t^{\alpha }: \mathbb {M}_{2} \rightarrow \mathbb {R}_{\ge 0}\) is defined as:

$$\begin{aligned} k_{t}^{\alpha }= t \mathcal {L}^{1D}_\alpha (d/t) = \frac{t}{\beta } \left( \frac{d(\textbf{p}_0, \cdot )}{t} \right) ^\beta . \end{aligned}$$
(13)

Furthermore, the Riemannian distance \(d:=d(\textbf{p}_0,\cdot )\) is the viscosity solution of the eikonal PDE

$$\begin{aligned} \left\| \nabla d \right\| ^2 = \sum _{i=1}^3 (\mathcal {A}_{i} d / w_i)^2=1 \end{aligned}$$
(14)

with boundary condition \(d(\textbf{p}_0)=0\). Likewise the viscosity solution of the dilation PDE is

$$\begin{aligned} W_{\alpha }(\textbf{p},t)=-(k_t^{\alpha } \mathbin {\square } -U)(\textbf{p}) \end{aligned}$$
(15)

Proof

It is shown by Fathi in [54, Prop. 5.3] that (10) is a viscosity solution of the Hamilton–Jacobi equation (9) on a complete connected Riemannian manifold without boundary, under some (weak) conditions on the Hamiltonian and with the initial condition U being Lipschitz. In [53, Thm. 2], a similar statement is given but only for compact connected Riemannian manifolds, again under some weak conditions on the Hamiltonian but without any on the initial condition. Next, we employ these existing results and provide a self-contained proof of (11) and (12).

Because we are looking at a specific class of Lagrangians, the solutions can be equivalently written as (11). In [53, Prop. 2], this form can also be found. Namely, the Lagrangian \(\mathcal {L}_\alpha ^{1D}\) is convex for \(\alpha > 1\), so for any curve \(\gamma \in \Gamma _t:= \Gamma _t(\textbf{p}, \textbf{q})\) we have by direct application of Jensen’s inequality (omitting the superscript 1D):

$$\begin{aligned} \mathcal {L}_\alpha \left( \frac{1}{t} \int _0^t \Vert {{\dot{\gamma }}}(s)\Vert \textrm{d}s \right) \le \frac{1}{t} \int _0^t \mathcal {L}_\alpha (\Vert {{\dot{\gamma }}}(s)\Vert )\ \textrm{d}s, \end{aligned}$$

with equality if \(\Vert {{\dot{\gamma }}}\Vert \) is constant. This means that:

$$\begin{aligned} \inf _{\gamma \in \Gamma _t} t \mathcal {L}_\alpha \left( \frac{L(\gamma )}{t} \right) \le \inf _{\gamma \in \Gamma _t} \int _0^t \mathcal {L}_\alpha (\Vert {{\dot{\gamma }}}(s)\Vert )\ \textrm{d}s, \end{aligned}$$
(16)

where \(L(\gamma ):=L_{\mathcal {G}}(\gamma )\), recall (6), is the length of the curve \(\gamma \). Consider the subset of curves with constant speed \({\tilde{\Gamma }}_t = \{ \gamma \in \Gamma _t \mid \Vert {{\dot{\gamma }}}\Vert = L(\gamma )/t\} \subset \Gamma _t\). Optimizing over a subset can never decrease the infimum so we have:

$$\begin{aligned} \inf _{\gamma \in \Gamma _t} \int _0^t \mathcal {L}_\alpha (\Vert {{\dot{\gamma }}}(s)\Vert ) \textrm{d}s \le \inf _{\gamma \in {\tilde{\Gamma }}_t} \int _0^t \mathcal {L}_\alpha \left( \frac{L(\gamma )}{t} \right) \textrm{d}s \end{aligned}$$

The r.h.s of this equation is equal to the l.h.s of equation (16) as the length of a curve is independent of its parameterization. Thereby we have equality in (16). By monotonicity of \(\mathcal {L}_\alpha \) on \(\mathbb {R}_{>0}\), we may then concluded that:

$$\begin{aligned} \begin{aligned} \inf _{\gamma \in \Gamma _t} t \mathcal {L}_{\alpha } \left( L(\gamma )/t \right)&= t \mathcal {L}_{\alpha } \left( \inf _{\gamma \in \Gamma _t} L(\gamma )/t \right) \\&= t \mathcal {L}_{\alpha } (d(\textbf{p}, \textbf{q})/t). \end{aligned} \end{aligned}$$

That we can write the solution as (12) is a consequence of the left-invariant metric on the manifold. A similar derivation can be found in [28, Thm. 30]:

$$\begin{aligned} \begin{aligned} W_\alpha (\textbf{p},t)&= \inf _{\textbf{q}\in \mathbb {M}_2} U(\textbf{q}) + t \mathcal {L}_\alpha (d(\textbf{p}, \textbf{q})/t) \\&= \inf _{g \in G} U(g \textbf{p}_0) + t \mathcal {L}_\alpha (d(\textbf{p}, g \textbf{p}_0)/t) \\&= \inf _{g \in G} U(g \textbf{p}_0) + t \mathcal {L}_\alpha (d(g^{-1} \textbf{p}, \textbf{p}_0)/t) \\&= \inf _{g \in G} U(g \textbf{p}_0) + k_t^\alpha (g^{-1} \textbf{p}) \\&= (k_t^\alpha \mathbin {\square } U)(\textbf{p}) \end{aligned} \end{aligned}$$

It is shown in [55, Thm. 6.24] for complete connected Riemannian manifolds that the distance map \( d(\textbf{p}) \) is a viscosity solution of the Eikonal equation (14).

Finally, solutions of erosion and dilation PDEs correspond to each other. If \(W_\alpha \) is the viscosity solution of the erosion PDE with initial condition U, then \(-W_\alpha \) is the viscosity solution of the dilation PDE, with initial condition \(-U\). This means that the viscosity solution of the dilation PDE is given by (15). \(\square \)

4 Distance Approximations

To calculate the morphological kernel \(k_t^\alpha \) (13), we need the exact Riemannian distance d (6), but calculating this is computationally demanding. To alleviate this problem, we approximate the exact distance \(d(\textbf{p}_0, \cdot )\) with approximative distances, denoted with \(\rho : \mathbb {M}^2 \rightarrow \mathbb {R}_{\ge 0}\), which are computationally cheap. To this end, we define the logarithmic distance approximation \(\rho _c\), as explained in [28, Def.19] and [56, Def.6.1.2], by

$$\begin{aligned} \rho _c:= \sqrt{ (w_1 c^1)^2 + (w_2 c^2 )^2 + (w_3 c^3 )^2}. \end{aligned}$$
(17)

Note that all approximative distances \(\rho \) can be extended to something that looks like a metric on \(\mathbb {M}_2\). For example, we can define:

$$\begin{aligned} \rho (g_1 \textbf{p}_0,\ g_2 \textbf{p}_0):= \rho (g_1^{-1} g_2 \textbf{p}_0). \end{aligned}$$

But this is almost always not a true metric in the sense that it does not satisfy the triangle inequality. So in this sense an approximative distance is not necessarily a true distance. However, we will keep referring to them as approximative distances as we only require them to look like the exact Riemannian distance \(d(\textbf{p}_0, \cdot )\).

As already stated in the introduction, Riemannian distance approximations such as \(\rho _c\) begin to fail in the high spatial anisotropy cases \(\zeta \gg 1\). For these situations, we need sub-Riemannian distance approximations. In previous literature, two such sub-Riemannian approximations are suggested. The first one is standard [57, Sec. 6], the second one is a modified smooth version [29, p. 284], also seen in [48, eq. 14]:

$$\begin{aligned}&\sqrt{ \sqrt{\nu w_1^2w_3^2}\left| c^2 \right| + (w_1 c^1)^2 + (w_3 c^3)^2 } \end{aligned}$$
(18)
$$\begin{aligned}&\root 4 \of {\nu w_1^2w_3^2 \left| c^2 \right| ^2 + ((w_1 c^1)^2 + (w_3 c^3)^2)^2} \end{aligned}$$
(19)

In [48], \(\nu \approx 44\) is empirically suggested. Note that the sub-Riemannian approximations rely on the assumption that \(w_2 \ge w_1\).

However, they both suffer from a major shortcoming in the interaction between \(w_3\) and \(c^2\). When we let \(w_3 \rightarrow 0\) both approximations suggest that it becomes arbitrarily cheap to move in the \(c^2\) direction which is undesirable as this deviates from the exact distance d: moving spatially will always have a cost associated with it determined by at least \(w_1\).

To make a proper sub-Riemannian distance estimate, we will use the Zassenhaus formula, which is related to the Baker–Campbell–Hausdorff formula:

$$\begin{aligned} e^{t(X + Y)} = e^{tX} e^{tY} e^{-\frac{t^2}{2} \left[ X,Y \right] } e^{\mathcal {O}(t^3)} \dots , \end{aligned}$$
(20)

where we have used the shorthand \(e^x:= \exp (x)\). Filling in \(X = A_1\) and \(Y = A_3\) and neglecting the higher-order terms gives:

$$\begin{aligned} e^{t(A_1 + A_3)} \approx e^{tA_1} e^{tA_3} e^{\frac{t^2}{2} A_2}, \end{aligned}$$
(21)

or equivalently:

$$\begin{aligned} e^{\frac{t^2}{2} A_2} \approx e^{-tA_3} e^{-tA_1} e^{t(A_1 + A_3)}. \end{aligned}$$
(22)

This formula says that one can successively follow exponential curves in the “legal” directions \(\mathcal {A}_1\) and \(\mathcal {A}_3\) to effectively move in the “illegal” direction of \(\mathcal {A}_2\). Taking the lengths of these curves and adding them up gives an approximative upper bound on the sub-Riemannian distance:

$$\begin{aligned} \begin{aligned} d_{sr}(e^{\frac{t^2}{2} A_2})&\lessapprox \left( w_1 + w_3 + \sqrt{w_1^2 + w_3^2} \right) \left| t \right| \\&\le 2\left( w_1 + w_3 \right) \left| t \right| . \end{aligned} \end{aligned}$$
(23)

Substituting \(t \rightarrow \sqrt{2\left| t \right| }\) gives:

$$\begin{aligned} d_{sr}(e^{tA_2}) \lessapprox 2\sqrt{2}\left( w_1 + w_3 \right) \sqrt{\left| t \right| }. \end{aligned}$$
(24)

This inequality, together with the smoothing trick to go from (18) to (19), inspires then the following sub-Riemannian distance approximation:

$$\begin{aligned} \rho _{c, sr}:= \root 4 \of { \left( \nu (w_1 + w_3) \right) ^4 \left| c^2 \right| ^2 + ((w_1 c^1)^2 + (w_3 c^3)^2)^2},\nonumber \\ \end{aligned}$$
(25)

for some \(0<\nu <2\sqrt{2}\) s.t. the approximation is tight. We empirically suggest \(\nu \approx 1.6\), based on a numerical analysis that is tangential to [48, Fig. 3]. Notice that this approximation does not break down when we let \(w_3 \rightarrow 0\).

Furthermore, in view of contraction of SE(2) to the Heisenberg group \(H_3\) [29, Sec. 5.2], and the exact fundamental solution [32, eq. 27] of the Laplacian on \(H_3\) (where the norm \(\rho _{c,sr}\) appears squared in the numerator with \(1=w_1=w_3=\nu \)) we expect \(\nu \ge 1\).

Table 3 shows that both the old sub-Riemannian approximation (19) and new approximation (25) are appropriate in cases such as \(w_3=1\). Table 4 shows that the old approximation breaks down when we take \(w_3 = 0.5\), and that the new approximation behaves more appropriate.

Table 3 Same situation and metric parameters as Table 2, i.e., \(w_1 = w_3 = 1\) and \(w_2 = 8\). We see the exact distance d alongside the old sub-Riemannian approximation \(\rho _{b,sr,old}\) (19) and new approximation \(\rho _{b,sr}\) (25). For the old approximation, we chose \(\nu =44\), as suggested in [48], and for the new one \(\nu = 1.6\). We see that in this case both approximations are appropriate
Table 4 Same as Table 3 but then with \(w_1 = 1, w_2 = 8, w_3 = 0.5\). We see that in this case that the old sub-Riemannian approximation \(\rho _{b,sr,old}\) (19) underestimates the true distance and becomes less appropriate. The new approximation (25) is also not perfect but qualitatively better. Decreasing \(w_3\) would exaggerate this effect even further

The Riemannian and sub-Riemannian approximations can be combined into the following newly proposed practical approximation:

$$\begin{aligned} \rho _{c,com}:= \max (l,\ \min (\rho _{c, sr}\,\ \rho _{c})), \end{aligned}$$
(26)

where \(l: \mathbb {M}_2 \rightarrow \mathbb {R}\) is given by:

$$\begin{aligned} l:= \sqrt{ (w_1 x)^2 + (w_1 y)^2 + (w_3 \theta )^2 }, \end{aligned}$$
(27)

for which will we show that it is a lower bound of the exact distance d in Lemma 4.

The most important property of the combined approximation is that is automatically switches between the Riemannian and sub-Riemannian approximations depending on the metric parameters. Namely, the Riemannian approximation is appropriate very close to the reference point \(\textbf{p}_0\), but tends to overestimate the true distance at a moderate distance from it. The sub-Riemannian approximation is appropriate at moderate distances from \(\textbf{p}_0\), but tends to overestimate very close to it, and underestimate far away. The combined approximation is such that we get rid of the weaknesses that the approximations have on their own.

On top of these approximative distances, we also define \(\rho _b\), \(\rho _{b,sr}\), and \(\rho _{b,com}\) by replacing the logarithmic coordinates \(c^i\) by their corresponding half-angle coordinates \(b^i\) defined by:

$$\begin{aligned} b^1{} & {} = x \cos \tfrac{\theta }{2} + y \sin \tfrac{\theta }{2}, \nonumber \\ b^2{} & {} = -x \sin \tfrac{\theta }{2} + y \cos \tfrac{\theta }{2}, \nonumber \\ b^3{} & {} = \theta . \end{aligned}$$
(28)

So, for example, we define \(\rho _b\) as:

$$\begin{aligned} \rho _b:= \sqrt{(w_1 b^1)^2 + (w_2 b^2)^2 + (w_3 b^3)^2}. \end{aligned}$$
(29)

Why we use these coordinates will be explained in Sect. 5.1.

We can define approximative morphological kernels by replacing the exact distance in (13) by any of the approximative distances in this section. To this end we, for example, define \(k_b\) by replacing the exact distance in the morphological kernel k by \(\rho _b\):

$$\begin{aligned} k_{b,t}^\alpha := \frac{t}{\beta } \left( \frac{\rho _b}{t} \right) ^\beta , \end{aligned}$$
(30)

where we recall that \(\frac{1}{\alpha } + \frac{1}{\beta } = 1\) and \(\alpha >1\).

5 Main Theorem and Analysis

When the effect of erosion and dilation is calculated with an approximative morphological kernel an error is made. We are therefor interested in analyzing the behavior of this error. We do this by comparing the approximative morphological kernels with the exact kernel \(k_t^\alpha \) (13). The result of our analysis is summarized in the following theorem. Because there is no time t dependency in all the inequalities of our main result we use short notation \(k^\alpha := k_t^\alpha \), \(k_b^\alpha := k_{b,t}^\alpha \).

Theorem 1

(Quality of approximative morphological kernels) Let \(\zeta := \frac{w_2}{w_1}\) denote the spatial anisotropy, and let \(\beta \) be such that \(\frac{1}{\alpha } + \frac{1}{\beta } = 1\), for some \(\alpha >1\) fixed. We assess the quality of our approximative kernels in three ways:

  • The exact and all approximative kernels have the same symmetries, see Table 5.

  • Globally it holds that:

    $$\begin{aligned} \zeta ^{-\beta } k^\alpha \le k_b^\alpha \le \zeta ^{\beta } k^\alpha , \end{aligned}$$
    (31)

    from which we see that in the case \(\zeta = 1\) we have that \(k^\alpha _b\) is exactly equal to \(k^\alpha \).

  • Locally aroundFootnote 1\(\textbf{p}_0\) we have:

    $$\begin{aligned} k_b^\alpha \le (1 + \varepsilon )^{\beta /2} k^\alpha . \end{aligned}$$
    (32)

    where

    $$\begin{aligned} \varepsilon := \frac{\zeta ^2 - 1}{2 w_3^2} \zeta ^4 \rho _b^2 + \mathcal {O}(\left| \theta \right| ^3). \end{aligned}$$
    (33)
Table 5 Overview of the fundamental symmetries \(\varepsilon _i\) in half-angle coordinates \(b^i\) and logarithmic coordinates \(c^i\). For example \(\varepsilon _3(c^1, c^2, c^3) = (-c^1, -c^2, c^3)\)

Proof

The proof of the parts of the theorem will be discussed throughout the upcoming subsections.

  • The symmetries are shown in Corollary 1.

  • The global bound (31) is shown in Corollary 3.

  • The local bound (32) is shown in Corollary 5.

\(\square \)

Clearly, as all approximative kernels are solely functions of the corresponding approximative distances, the analysis of the quality of an approximative kernel reduces to analyzing the quality of the approximative distance that is used, and this is exactly what we will do.

In previous work on PDE-G-CNN’s the bound \(d=d(\textbf{p}_0,\cdot ) \le \rho _c\) is proven [28, Lem. 20]. Furthermore, it is shown that around \(\textbf{p}_0\) one has:

$$\begin{aligned} \rho _c^2 \le d^2 + \mathcal {O}(d^4), \end{aligned}$$
(34)

which has the corollary that there exist a constant \(C \ge 1\) such that

$$\begin{aligned} \rho _c \le C d \end{aligned}$$
(35)

for any compact neighborhood around \(\textbf{p}_0\). We improve on these results by:

  • Showing that the approximative distances have the same symmetries as the exact Riemannian distance; Lemma 3.

  • Finding simple global bounds on the exact distance d which can then be used to find global estimates of \(\rho _b\) by d; Lemma 4. This improves upon (35) by finding an expression for the constant C.

  • Estimating the leading term of the asymptotic expansion, and observing that our upper bound of the relative error between \(\rho _b\) and d explodes in the cases \(\zeta \rightarrow \infty \) and \(w_3 \rightarrow 0\); Lemma 7. This improves upon equation (34).

Note, however, that we are not analyzing \(\rho _c\): we will be analyzing \(\rho _b\). This is mainly because the half-angle coordinates are easier to work with: they do not have the \({{\,\textrm{sinc}\,}}\tfrac{\theta }{2}\) factor the logarithmic coordinates have. Using that

$$\begin{aligned} b^1 = c^1 {{\,\textrm{sinc}\,}}\tfrac{\theta }{2},\ b^2 = c^2 {{\,\textrm{sinc}\,}}\tfrac{\theta }{2},\ b^3 = c^3, \end{aligned}$$
(36)

recall (28) and (8), we see that

$$\begin{aligned} {{\,\textrm{sinc}\,}}\tfrac{\theta }{2}\ \rho _c \le \rho _b \le \rho _c, \end{aligned}$$

and thus locally \(\rho _c\) and \(\rho _b\) do not differ much, and results on \(\rho _b\) can be easily transferred to (slightly weaker) results on \(\rho _c\).

5.1 Symmetry Preservation

Symmetries play a major role in the analysis of (sub-)Riemannian geodesics/distance in SE(2). They help to analyze symmetries in Hamiltonian flows [44] and corresponding symmetries in association field models [42, Fig. 11]. There are together 8 of them and their relation with logarithmic coordinates \(c^i\) (Lemma 1) shows they correspond to inversion of the Lie-algebra basis \(A_i \mapsto -A_i\). The symmetries for the sub-Riemannian setting are explicitly listed in [44, Prop. 4.3]. They can be algebraically generated by the (using the same labeling as [44]) following three symmetries:

$$\begin{aligned} \begin{array}{l} \varepsilon ^{2}(x,y,\theta ) = (-x \cos \theta - y \sin \theta , -x \sin \theta + y \cos \theta , \theta ),\\ \varepsilon ^{1}(x,y,\theta ) = (x \cos \theta + y \sin \theta , x \sin \theta - y \cos \theta , \theta ), \text { and } \\ \varepsilon ^{6}(x,y,\theta ) = (x \cos \theta + y \sin \theta , -x \sin \theta + y \cos \theta , -\theta ). \end{array}\nonumber \\ \end{aligned}$$
(37)

They generate the other four symmetries as follows:

$$\begin{aligned} \begin{array}{l} \varepsilon ^{3}=\varepsilon ^{2} \circ \varepsilon ^1,\ \varepsilon ^{4}=\varepsilon ^{2} \circ \varepsilon ^6,\ \varepsilon ^{7}=\varepsilon ^{1} \circ \varepsilon ^6, \\ \text { and } \varepsilon ^{5}= \varepsilon ^2 \circ \varepsilon ^{1} \circ \varepsilon ^6. \end{array} \end{aligned}$$
(38)

and with \(\varepsilon ^0 = \text {id}\). All symmetries are involutions: \(\varepsilon ^i \circ \varepsilon ^i = \text {id}\). Henceforth all eight symmetries will be called ‘fundamental symmetries.’ How all fundamental symmetries relate to each other becomes clearer if we write them down in either logarithm or half-angle coordinates.

Lemma 1

(8 fundamental symmetries) The 8 fundamental symmetries \(\varepsilon _i\), in either half-angle coordinates \(b^i\) or logarithmic coordinates \(c^i\), correspond to sign flips as laid out in Table 5.

Fig. 11
figure 11

a \(\varepsilon ^2\), b \(\varepsilon ^1\), c \(\varepsilon ^6\). The fixed points of the \(\varepsilon ^2\), \(\varepsilon ^1\), and \(\varepsilon ^6\). For \(\varepsilon ^2\) and \(\varepsilon ^1\), only the points within the region \(x^2 + y^2 \le 2^2\) are plotted. For \(\varepsilon ^6\), only the points in the region \(\max (\left| x \right| ,\left| y \right| ) \le 2\). The fixed points of \(\varepsilon ^2\), \(\varepsilon ^1\), and \(\varepsilon ^6\) correspond, respectively, to the points in \(\mathbb {M}_2\) that are coradial, cocircular, and parallel to the reference point \(\textbf{p}_0\)

Fig. 12
figure 12

a Coradial, b Cocircular, c Parallel. An example of points in \(\mathbb {M}_2\) that are coradial, cocircular, and parallel

Proof

We will only show that \(\varepsilon ^2\) flips \(b^1\). All other calculations are done analogously. Pick a point \(\textbf{p}= (x,y,\theta )\) and let \(\textbf{q}= \varepsilon ^2(\textbf{p})\). We now calculate \(b^1(\textbf{q})\):

$$\begin{aligned} \begin{aligned} b^1(\textbf{q}) ={}&x(\textbf{q}) \cos \tfrac{\theta (\textbf{q})}{2} + y(\textbf{q}) \sin \tfrac{\theta (\textbf{q})}{2}\\ =&- (x \cos \theta + y \sin \theta ) \cos \tfrac{\theta }{2} \\&+ (-x \sin \theta + y \cos \theta ) \sin \tfrac{\theta }{2}\\ =&-x (\cos \theta \cos \tfrac{\theta }{2} + \sin \theta \sin \tfrac{\theta }{2} ) \\&- y(\sin \cos \tfrac{\theta }{2} - \cos \theta \sin \tfrac{\theta }{2})\\ =&- x \cos \tfrac{\theta }{2} - y \sin \tfrac{\theta }{2}\\ =&-b^1(\textbf{p}), \end{aligned} \end{aligned}$$

where we have used the trigonometric difference identities of cosine and sine in the second-to-last equality. From the relation between logarithmic and half-angle coordinates (36), we have that the logarithmic coordinates \(c^i\) flip in the same manner under the symmetries. \(\square \)

The fixed points of the symmetries \(\varepsilon ^2\), \(\varepsilon ^1\), and \(\varepsilon ^6\) have an interesting geometric interpretation. The logarithmic and half-angle coordinates, being so closely related to the fundamental symmetries, also carry the same interpretation. Definition 1 introduces this geometric idea and Lemma 2 makes its relation to the fixed points of the symmetries precise. In Fig. 11, the fixed points are visualized, and in Fig. 12 a visualization of these geometric ideas can be seen.

Definition 1

Two points \(\textbf{p}_1=(\textbf{x}_1,\textbf{n}_1)\), \(\textbf{p}_2=(\textbf{x}_{2},\textbf{n}_1)\) of \(\mathbb {M}_{2}\) are called cocircular if there exist a circle, of possibly infinite radius, passing through \(\textbf{x}_1\) and \(\textbf{x}_2\) such that the orientations \(\textbf{n}_1 \in S^1\) and \(\textbf{n}_{2} \in S^1\) are tangents to the circle, at, respectively, \(\textbf{x}_1\) and \(\textbf{x}_2\), in either both the clockwise or anti-clockwise direction. Similarly, the points are called coradial if the orientations are normal to the circle in either both the outward or inward direction. Finally, two points are called parallel if their orientations coincide.

Co-circularity has a well-known characterization that is often used for line enhancement in image processing, such as tensor voting [58].

Remark 1

Point \(\textbf{p}=(r \cos \phi , r \sin \phi , \theta ) \in \mathbb {M}_2\) is cocircular to the reference point \(\textbf{p}_0=(0,0,0)\) if and only if the double angle equality \(\theta \equiv 2 \phi \mod 2\pi \) holds.

In fact all fixed points of the fundamental symmetries can be intuitively characterized:

Lemma 2

(Fixed Points of Symmetries) Fix reference point \(\textbf{p}_0=(0,0,0) \in \mathbb {M}_2\).

The point \(g \textbf{p}_0\in \mathbb {M}_2\) with \(g \in SE(2)\) is, respectively,

  • coradial to \(\textbf{p}_0\) when

    $$\begin{aligned} c^1(g) = 0 \Leftrightarrow \varepsilon _2(g) = g \Leftrightarrow g \in \exp (\left\langle A_2, A_3 \right\rangle ), \end{aligned}$$
    (39)
  • cocircular to \(\textbf{p}_0\) when

    $$\begin{aligned} c^2(g) = 0 \Leftrightarrow \varepsilon _1(g) = g \Leftrightarrow g \in \exp (\left\langle A_1, A_3 \right\rangle ), \end{aligned}$$
    (40)
  • parallel to \(\textbf{p}_0\) when

    $$\begin{aligned} c^3(g) = 0 \Leftrightarrow \varepsilon _6(g) = g \Leftrightarrow g \in \exp (\left\langle A_1, A_2 \right\rangle ). \end{aligned}$$
    (41)

Proof

We will only show (40), the others are done analogously. We start by writing \(g=(r \cos \phi , r \sin \phi , \theta )\) and calculating that \(g \odot \textbf{p}_0 = (r \cos \phi , r \sin \phi , (\cos \theta , \sin \theta ))\). Then by Remark 1 we known that \(g \textbf{p}_0\) is cocircular to \(\textbf{p}_0\) if and only if \(2\phi = \theta {{\,\textrm{mod}\,}}2\pi \). We can show this is equivalent to \(c^2(g)=0\):

$$\begin{aligned} c^2(g) = 0&\Leftrightarrow b^2(g) = 0 \\&\Leftrightarrow -x \sin \tfrac{\theta }{2} +y \cos \tfrac{\theta }{2}=0\\&\Leftrightarrow -\cos \phi \sin \tfrac{\theta }{2}+\sin \phi \cos \tfrac{\theta }{2}=0\\&\Leftrightarrow \sin (\phi -\tfrac{\theta }{2})=0 \Leftrightarrow 2\phi = \theta {{\,\textrm{mod}\,}}2\pi . \end{aligned}$$

In logarithmic coordinates, \(\varepsilon _1\) is equivalent to:

$$\begin{aligned} \varepsilon _1(c^1, c^2, c^3) = (c^1, -c^2, c^3) \end{aligned}$$

from which we may deduce that \(\varepsilon _1(g) = g\) is equivalent to \(c^2(g) = 0\). If \(c^2(g) = 0\) then \(\log g \in \left\langle A_1, A_3 \right\rangle \) and thus \(g \in \exp (\left\langle A_1, A_3 \right\rangle )\). As for the other way around, it holds by simple computation that:

$$\begin{aligned} c^2(\exp (c^1A_1 + c^3A_3)) = 0 \end{aligned}$$

which shows that \(g \in \exp (\left\langle A_1, A_3 \right\rangle ) \Rightarrow c^2(g) = 0\). \(\square \)

In the important work [44] on sub-Riemannian geometry on SE(2) by Sachkov and Moiseev, it is shown that the exact sub-Riemannian distance \(d_{sr}\) is invariant under the fundamental symmetries \(\varepsilon ^i\). However, these same symmetries hold true for the Riemannian distance d. Moreover, because the approximative distances use the logarithmic coordinates \(c^i\) and half-angle coordinates \(b^i\) they also carry the same symmetries. The following lemma makes this precise.

Lemma 3

(Symmetries of the exact distance and all proposed approximations) All exact and approximative (sub)-Riemannian distances (w.r.t. the reference point \(\textbf{p}_0\)) are invariant under all the fundamental symmetries \(\varepsilon _i\).

Proof

By Table 5, one sees that \(\varepsilon ^3, \varepsilon ^4\), and \(\varepsilon ^5\) also generate all symmetries. Therefore, if we just show that all distances are invariant under these select three symmetries we also have shown that they are invariant under all symmetries. We will first show the exact distance, in either the Riemannian or sub-Riemannian case, is invariant w.r.t. these three symmetries, i.e., \(d(\textbf{p}) = d(\varepsilon ^i(\textbf{p}))\) for \(i \in \{3,4,5\}\). By (38) and (37), one has \(\varepsilon ^3(x,y,\theta )=(-x,-y,\theta )\) and \(\varepsilon ^4(x,y,\theta ) = (-x,y,-\theta )\). Now consider the push forward \(\varepsilon ^3_*\). By direct computation (in \((x,y,\theta )\) coordinates), we have \(\varepsilon ^3_* \left. \mathcal {A}_i \right| _\textbf{p}= \pm \left. \mathcal {A}_i \right| _{\varepsilon ^3(\textbf{p})}\). Because the metric tensor field \(\mathcal {G}\) (4) is diagonal w.r.t. to the \(\mathcal {A}_i\) basis this means that \(\varepsilon ^3\) is a isometry. Similarly, \(\varepsilon ^4\) is an isometry. Being an isometry of the metric \(\mathcal {G}\), we may directly deduce that \(\varepsilon ^3\) and \(\varepsilon ^4\) preserve distance. The \(\varepsilon ^5\) symmetry flips all the signs of the \(c^i\) coordinates which amounts to Lie algebra inversion: \( -\log g = \log (\varepsilon ^5(g)) \). Taking the exponential on both sides shows that \(g^{-1} = \varepsilon ^5(g)\). By left-invariance of the metric, we have \(d(g \textbf{p}_0, \textbf{p}_0) = d(\textbf{p}_0, g^{-1} \textbf{p}_0)\), which holds in both the Riemannian and sub-Riemannian case, and thus \( d(g\textbf{p}_0) = d(\varepsilon ^5(g\textbf{p}_0)) \). That all approximative distances (both in the Riemannian and sub-Riemannian case) are also invariant under all the symmetries is not hard to see: every \(b^i\) and \(c^i\) term is either squared or the absolute value is taken. Flipping signs of these coordinates, recall Lemma 1, has no effect on the approximative distance. \(\square \)

Corollary 1

(All kernels preserve symmetries) The exact kernel and all approximative kernels have the same fundamental symmetries.

Proof

The kernels are direct functions of the exact and approximative distances, recall for example (13), so from Lemma 3 we can immediately conclude that they also carry the 8 fundamental symmetries. \(\square \)

In Fig. 10, the previous lemma can be seen. The two fundamental symmetries \(\varepsilon ^2\) and \(\varepsilon ^1\) correspond, respectively, to reflecting the isocontours (depicted in colors) along their short edge and long axis. The \(\varepsilon ^6\) symmetry corresponds to mapping the positive \(\theta \) isocontours to their negative \(\theta \) counterparts. In Fig. 13, one can see an isocontour of \(\rho _b\) together with the symmetry “planes” of \(\varepsilon _2\), \(\varepsilon _1\) and \(\varepsilon _6\).

Fig. 13
figure 13

In grey the isocontour \(\rho _b=2.5\), together with the symmetry “planes” of \(\varepsilon _2\), \(\varepsilon _1\) and \(\varepsilon _6\), as also plotted in Fig. 11. The metric parameters are \((w_1,w_2,w_3)=(1,2,1)\)

5.2 Simple Global Bounds

Next we provide some basic global lower and upper bounds for the exact Riemannian distance d (6). Recall that the lower bound l plays an important role in the combined approximation \(\rho _{c,com}\) (26) when far from the reference point \(\textbf{p}_0\).

Lemma 4

(Global bounds on distance) The exact Riemannian distance \(d=d(\textbf{p}_0,\cdot )\) is greater than or equal to the following lower bound \(l: \mathbb {M}_2 \rightarrow \mathbb {R}\):

$$\begin{aligned} l:= \sqrt{ (w_1 x)^2 + (w_1 y)^2 + (w_3 \theta )^2 } \le d \end{aligned}$$

and less than or equal to the following upper bounds \(u_1, u_2: \mathbb {M}_2 \rightarrow \mathbb {R}\):

$$\begin{aligned} d \le u_1&:= \sqrt{ (w_2 x)^2 + (w_2 y)^2 + (w_3 \theta )^2 }\\ d \le u_2&:= \sqrt{ (w_1 x)^2 + (w_1 y)^2 } + w_3 \pi \end{aligned}$$

Proof

We will first show \(l \le d\). Consider the following spatially isotropic metric:

$$\begin{aligned} {\tilde{\mathcal {G}}} = w_1^2\ \omega ^1 \otimes \omega ^1 + w_1^2\ \omega ^2 \otimes \omega ^2 + w_3^2 \ \omega ^3 \otimes \omega ^3. \end{aligned}$$

We assumed w.l.o.g. that \(w_1 \le w_2\) so we have for any vector \(v \in T\mathbb {M}_2\) that \( \Vert v\Vert _{{\tilde{\mathcal {G}}}} \le \Vert v\Vert _{\mathcal {G}} \). From this, we can directly deduce that for any curve \(\gamma \) on \(\mathbb {M}_2\) we have that \(L_{{\tilde{\mathcal {G}}}}(\gamma ) \le L_{\mathcal {G}}(\gamma )\). Now consider a length-minimizing curve \(\gamma \) w.r.t. \(\mathcal {G}\) between the reference point \(\textbf{p}_0\) and some end point \(\textbf{p}\). We then have the chain of (in)equalities:

$$\begin{aligned} d_{{\tilde{\mathcal {G}}}}(\textbf{p}) \le L_{{\tilde{\mathcal {G}}}}(\gamma ) \le L_{\mathcal {G}}(\gamma ) = d_{\mathcal {G}}(\textbf{p}) \end{aligned}$$

Furthermore, because the metric \({\tilde{\mathcal {G}}}\) is spatially isotropic it can be equivalently be written as:

$$\begin{aligned} {\tilde{\mathcal {G}}} = w_1^2\ dx \otimes dx + w_1^2\ dy \otimes dy + w_3^2 \ d\theta \otimes d\theta , \end{aligned}$$

which is a constant metric on the coordinate covector fields, and thus:

$$\begin{aligned} d_{{\tilde{\mathcal {G}}}}(\textbf{p}) = \sqrt{ (w_1 x)^2 + (w_1 y)^2 + (w_3 \theta )^2 } = l. \end{aligned}$$

Putting everything together gives the desired result of \(l \le d\). To show that \(d \le u_1\) can be done analogously.

As for showing \(d \le u_2\) we will construct a curve \(\gamma \) of which the length \(L(\gamma )\) w.r.t. \(\mathcal {G}\) can be bounded from above with \(u_2\). This in turn shows that \(d \le u_2\) by definition of the distance. Pick a destination position and orientation \(\textbf{p}= (\textbf{x}, \textbf{n})\). The constructed curve \(\gamma \) will be as follows. We start by aligning our starting orientation \(\textbf{n}_0 = (1,0) \in S^1\) toward the destination position \(\textbf{x}\). This desired orientation toward \(\textbf{x}\) is \({\hat{\textbf{x}}}:= \frac{\textbf{x}}{r}\) where \(r = \Vert \textbf{x}\Vert = \sqrt{x^2 + y^2}\). This action will cost \(w_3 a\) for some \(a \ge 0\). Once we are aligned with \({\hat{\textbf{x}}}\), we move toward \(\textbf{x}\). Because we are aligned this action will cost \(w_1 r\). Now that we are at \(\textbf{x}\) we align our orientation with the destination orientation \(\textbf{n}\), which will cost \(w_3b\) for some \(b \ge 0\). Altogether we have \(L(\gamma ) = w_1 r + w_3 (a+b)\). In its current form, the constructed curve actually doesn’t have that \(a+b\le \pi \) as desired. To fix this, we realize that we did not necessarily had to align with \({\hat{\textbf{x}}}\). We could have aligned with \(-{\hat{\textbf{x}}}\) and move backwards toward \(\textbf{x}\), which will also cost \(w_1 r\). One can show that one of the two methods (either moving forwards or backwards toward \(\textbf{x}\)) indeed has that \(a+b\le \pi \) and thus \(d \le u_2\). \(\square \)

These bounds are simple but effective: they help us prove a multitude of insightful corollaries.

Corollary 2

(Global error distance) Simple manipulations, together with the fact that \(x^2 + y^2 = (b^1)^2 + (b^2)^2\), give the following inequalities between \(l, u_1\) and \(\rho _b\):

$$\begin{aligned} l \le \rho _b \le u_1,\ \frac{1}{\zeta } u_1 \le \rho _b \le \zeta l. \end{aligned}$$

The second equation can be extended to inequalities between \(\rho _b\) and d:

$$\begin{aligned} \frac{1}{\zeta } d \le \rho _b \le \zeta d \end{aligned}$$
(42)

Remark 2

If \(w_1 = w_2 \Leftrightarrow \zeta = 1\), i.e., the spatially isotropic case, then the lower and upper bound coincide, thus becoming exact. Because \(\rho _b\) is within the lower and upper bound it also becomes exact.

Corollary 3

(Global error kernel) Globally the error is independent of time \(t>0\) and is estimated by the spatial anisotropy \(\zeta \ge 1\) (5) as follows:

$$\begin{aligned} \zeta ^{-\beta } k^\alpha \le k_b^\alpha \le \zeta ^{\beta } k^\alpha . \end{aligned}$$

For \(\zeta =1\), there is no error.

Proof

We will only prove the second inequality, the first is done analogously.

$$\begin{aligned} \begin{aligned} k_b^\alpha&:= \frac{1}{\beta } (\rho _b/t)^\beta \le \frac{1}{\beta } \left( \zeta d/t \right) ^\beta \\&= \zeta ^{\beta } \left( \frac{1}{\beta } \left( d/t \right) ^\beta \right) = \zeta ^{\beta } k^\alpha \end{aligned} \end{aligned}$$

\(\square \)

The previous result indicates that problems can arise if \(\zeta \rightarrow \infty \), which indeed turns out to be the case:

Corollary 4

(Observing the problem) If we restrict ourselves to \(x=\theta =0\), we have that \(u_1 = \rho _b = \rho _c = w_2\left| y \right| \). From this, we can deduce that one can be certain that both \(\rho _b\) and \(\rho _c\) become bad approximations away from \(\textbf{p}_0\). Namely, when \(\zeta> 1 \Leftrightarrow w_2 > w_1\) both approximations go above \(u_2\) if one looks far enough away from \(\textbf{p}_0\). How “fast” it goes bad is determined by all metric parameters. Namely, the intersection of the approximations \(\rho _b\) and \(\rho _c\), and \(u_2\) is at \(\left| y \right| = \frac{w_3\pi }{w_2 - w_1}\), or equivalently at \(\rho = \frac{w_3\pi }{1 - \zeta ^{-1}}\). This intersection is visible in Fig. 14 in the higher anisotropy cases. From this expression of the intersection, we see that in the cases \(w_3 \rightarrow 0\) and \(\zeta \rightarrow \infty \) the Riemannian distance approximations \(\rho _b\) and \(\rho _c\) quickly go bad. We will see exactly the same behavior in Lemma 7 and Remark 3.

Lemma 4 is visualized in Figs. 14 and 15. In Fig. 14, we consider the behavior of the exact distance and bounds along the y-axis, that is at \(x=\theta =0\). We have chosen to inspect the y-axis because it consists of points that are hard to reach from the reference point \(\textbf{p}_0\) when the spatial anisotropy is large, which makes it interesting. In contrast, along the x-axis \(l,d,\rho _b,\rho _c, u_1\) and \(w_1\left| x \right| \) all coincide, and is therefore uninteresting. To provide more insight we also depict the bounds along the \(y=x\) axis, see Fig. 15. Observe that in both figures, the exact distance d is indeed always above the lower bound l and below the upper bounds \(u_1\) and \(u_2\).

5.3 Asymptotic Error Expansion

In this section, we provide an asymptotic expansion of the error between the exact distance d and the half-angle distance approximation \(\rho _b\) (Lemma 7). This error is then leveraged to an error between the exact morphological kernel k and the half-angle kernel \(k_b\) (Corollary 5). We also give a formula that determines a region for which the half-angle approximation \(\rho _b\) is appropriate given an a priori tolerance bound (Remark 3).

Lemma 5

Let \(\gamma :[0,1] \rightarrow \mathbb {M}_2\) be a minimizing geodesic from \(\textbf{p}_0\) to \(\textbf{p}\). We have that:

$$\begin{aligned} \rho _b(\textbf{p}) \le d(\textbf{p}) \max _{t \in [0,1]} \Vert d\rho _b\vert _{\gamma (t)} \Vert . \end{aligned}$$
Fig. 14
figure 14

a \(w_2 = 1\), b \(w_2 = 2\), c \(w_2 = 3\), d \(w_2 = 4\). Exact distance and its lower and upper bounds (given in Lemma 4) along the y-axis, i.e., at \(x=\theta =0\), for increasing spatial anisotropy. We keep \(w_1=w_3=1\) and vary \(w_2\). The horizontal axis is y and the vertical axis the value of the distance/bound. Note how the exact distance d starts of linearly with a slope of \(w_2\), and ends linearly with a slope of \(w_1\)

Fig. 15
figure 15

a \(w_2 = 1\), b \(w_2 = 2\), c \(w_2 = 3\), d \(w_2 = 4\). Same setting as Fig. 14 but at \(x=y, \theta =0\). The horizontal axis moves along the line \(x=y\)

Proof

The fundamental theorem of calculus tells us that:

$$\begin{aligned} \int _0^1 (\rho _b \circ \gamma )'(t)\ dt = \rho _b(\gamma (1)) - \rho _b(\gamma (0)) = \rho _b(\textbf{p}), \end{aligned}$$

but one can also bound this expression as follows:

$$\begin{aligned} \int _0^1 (\rho _b \circ \gamma )'(t)\ dt&= \int _0^1 \left\langle d\rho _b\vert _{\gamma (t)}, {{\dot{\gamma }}}(t) \right\rangle \ dt \\&\le \int _0^1 \left\| d\rho _b\vert _{\gamma (t)} \right\| \left\| {{\dot{\gamma }}}(t) \right\| \ dt\\&\le \left( \max _{t \in [0,1]} \Vert d\rho _b\vert _{\gamma (t)} \Vert \right) \int _0^1 \left\| {{\dot{\gamma }}}(t) \right\| \ dt \\&= d(\textbf{p}) \max _{t \in [0,1]} \Vert d\rho _b\vert _{\gamma (t)} \Vert . \end{aligned}$$

Putting the two together gives the desired result. \(\square \)

Lemma 6

One can bound \(\Vert d\rho _b\Vert \) around \(\textbf{p}_0\) by:

$$\begin{aligned} \Vert d \rho _b\Vert ^2 \le 1 + \frac{\zeta ^2 - 1}{2w_3^2} \rho _b^2 + \mathcal {O}(\theta ^3). \end{aligned}$$

Proof

The proof is deferred to Appendix 1\(\square \)

By combining the simple Lemmas 5 and 6, one can find an expression for the asymptotic error between the exact distance d and the half-angle approximation \(\rho _b\).

Lemma 7

Around any compact neighborhood of \(\textbf{p}_0\), we have that

$$\begin{aligned} \rho _b^2 \le ( 1 + \varepsilon ) d^2, \text { where } \varepsilon := \frac{\zeta ^2 - 1}{2w_3^2} \zeta ^4 \rho _b^2 + C \left| \theta \right| ^3. \end{aligned}$$
(43)

for some \(C \ge 0\).

Proof

Let \(\textbf{p}\in U\) be given, and let \(\gamma : [0,1] \rightarrow \mathbb {M}_2\) be the geodesic from \(\textbf{p}_0\) to \(\textbf{p}\). For the distance, we know that

$$\begin{aligned} d(\gamma (s)) \le d(\gamma (t)), \text { for } s \le t. \end{aligned}$$

Making use of (42), we know that \(\frac{1}{\zeta } \rho _b \le d \le \zeta \rho _b\) so we can combine this with the previous equation to find:

$$\begin{aligned} \rho _b(\gamma (s)) \le \zeta ^2 \rho _b(\gamma (t)), \text { for } s \le t. \end{aligned}$$

from which we get that

$$\begin{aligned} \max _{t \in [0,1]} \rho _b(\gamma (t)) \le \zeta ^2 \rho _b(\textbf{p}). \end{aligned}$$

Combining this fact with the above two lemmas allows us to conclude (43). \(\square \)

Remark 3

(Region for approximation \(\rho _b \approx d\)) Putting an a priori tolerance bound \(\varepsilon _{tol}\) on the error \(\varepsilon \) (and neglecting the \(\mathcal {O}(\theta ^3)\) term) gives rise to a region \(\Omega _0\) on which the local approximation \(\rho _b\) is appropriate:

$$\begin{aligned} \Omega _0=\{ \textbf{p}\in \mathbb {M}_2 \mid \rho _b(\textbf{p}) < \frac{2 w_3^2}{(\zeta ^2-1)\zeta ^4} \varepsilon _{tol}\}. \end{aligned}$$

Thereby we cannot guarantee a large region of acceptable relative error when \(w_3 \rightarrow 0\) or \(\zeta \rightarrow \infty \). We solve this problem

by using \(\rho _{b, com}\) given (26) instead of \(\rho _b\).

Corollary 5

(Local error morphological kernel) Locally around \(\textbf{p}_0\), we have:

$$\begin{aligned} k^\alpha _b < (1 + \varepsilon )^{\beta /2} k^\alpha . \end{aligned}$$

Proof

By Lemma 7, one has

$$\begin{aligned} k^\alpha _b:= \frac{1}{\beta } (\rho _b/t)^\beta \le \frac{1}{\beta } ((1 + \varepsilon )d^2/t^2)^{\beta /2} = (1 + \varepsilon )^{\beta /2} k^\alpha . \end{aligned}$$

\(\square \)

6 Experiments

6.1 Error of Half Angle Approximation

We can quantitatively analyze the error between any distance approximation \(\rho \) and the exact Riemannian distance d as follows. We do this by first choosing a region \(\Omega \subseteq \mathbb {M}_2\) in which we will analyze the approximation. Just as in Tables 1 and 2, we decided to inspect \(\Omega := [-3,3]\times [-3,3]\times [-\pi ,\pi ) \subseteq \mathbb {M}_2\). As for our exact measure of error \(\varepsilon \), we have decided on the mean relative error defined as:

$$\begin{aligned} \varepsilon := \frac{1}{\mu (\Omega )} \int _{\Omega } \frac{\left| \rho _b(\textbf{p}) - d(\textbf{p}) \right| }{d(\textbf{p})} d\mu (\textbf{p}) \end{aligned}$$
(44)

where \(\mu \) is the induced Riemannian measure determined by the Riemannian metric \(\mathcal {G}\). We then discretized our domain \(\Omega \) into a grid of \(101 \times 101 \times 101\) equally spaced points \(\textbf{p}_i\ \in \Omega \) indexed by some index set \(i \in I\) and numerically solved for the exact distance d on this grid. This numerical scheme is of course not exact and we will refer to these values as \({\tilde{d}}_i \approx d(\textbf{p}_i)\). We also calculate the value of the distance approximation \(\rho \) on the grid points \(\rho _i:= \rho (\textbf{p}_i)\). Once we have these values, we can approximate the true mean relative error \(\varepsilon \) by calculating the numerical error \({\tilde{\varepsilon }}\) defined by:

$$\begin{aligned} \varepsilon \approx {\tilde{\varepsilon }}:= \frac{1}{\left| I \right| } \sum _{i \in I} \frac{\left| \rho _i - {\tilde{d}}_i \right| }{{\tilde{d}}_i} \end{aligned}$$
(45)

In Table 6, the numerical mean relative error \({\tilde{\varepsilon }}\) between the half-angle approximation \(\rho _b\) and the numerical Riemannian distance \({\tilde{d}}\) can be seen for different spatial anisotropies \(\zeta \). We keep \(w_1=w_3=1\) constant and vary \(w_2\). We see that, as shown visually in Tables 1 and 2, that \(\rho _b\) gets worse and worse when we increase the spatial anisotropy \(\zeta \).

There is an discrepancy in the table worth mentioning. We know from Remark 2 that when \(\zeta = 1\) then \(\rho _b = d\) and thus \(\varepsilon = 0\). But surprisingly we do not have \({\tilde{\varepsilon }} = 0\) in the \(\zeta = 1\) case in Table 6. This can be simply explained by the fact that the numerical solution \({\tilde{d}}\) is not exactly equal to the true distance d. We expect that \({\tilde{\varepsilon }}\) will go to 0 in the \(\zeta = 1\) case if we discretize our region \(\Omega \) more and more finely.

We can compare these numerical results to our theoretical results. Namely, we can deduce from Equation (42) that:

$$\begin{aligned} \frac{\left| \rho _b - d \right| }{d} \le \zeta - 1, \end{aligned}$$
(46)

which means

$$\begin{aligned} \varepsilon \le \zeta - 1. \end{aligned}$$
(47)

And so we expect this to also approximately hold for the numerical mean relative error \({\tilde{\varepsilon }}\). Indeed, in Table 6 we can see that \( {\tilde{\varepsilon }} \lessapprox \zeta - 1\).

Interestingly, we see that \({\tilde{\varepsilon }}\) is relatively small compared to our theoretical bound (47) even in the high anisotropy cases. However, this is only a consequence of relative smallness of \(\Omega \). If we make \(\Omega \) bigger and bigger we can be certain that \(\varepsilon \) converges to \(\zeta - 1\). This follows from an argument similar to the reasoning in Corollary 4.

Table 6 Numerical mean relative error \({\tilde{\varepsilon }}\) between \(\rho _b\) and d for multiple spatial anisotropies \(\zeta \)

6.2 DCA1

The DCA1 dataset is a publicly available database “consisting of 130 X-ray coronary angiograms, and their corresponding ground-truth image outlined by an expert cardiologist” [59]. One such angiogram and ground-truth can be seen in Fig. 18a and d.

We have split the DCA1 dataset [59] into a training and test set consisting of 125 and 10 images, respectively.

To establish a baseline, we ran a 3, 6, and 12 layer CNN, G-CNN and PDE-G-CNN on DCA1. The exact architectures are identical/analogous to the ones used in [28, Fig. 15]. For the baseline, the logarithmic distance approximation \(\rho _c\) was used within the PDE-G-CNNs. This is the same approximation that was used in [28]. Every network was trained 10 times for 80 epochs. After every epoch, the average Dice coefficient on the test set is stored. After every full training, the maximum of the average Dice coefficients over all 80 epochs is calculated. The result is 10 maximum average Dice coefficients for every architecture. The result of this baseline can be seen in Fig. 16. The amount of parameters of the networks can be found in Table 7. We see that PDE-G-CNNs consistently perform equally well as, and sometimes outperform, G-CNNs and CNNs, all the while having the least amount of parameters of all architectures.

Fig. 16
figure 16

A scatterplot showing how a 3, 6, and 12 layer CNN, G-CNN, and PDE-G-CNN compare on the DCA1 dataset. The crosses indicate the mean. We see the PDE-G-CNNs provide equal or better results with, respectively, 2, 10 and 35 times less parameters, see Table 7

Table 7 The total amount of parameters in the networks that are used in Fig. 16

To compare the effect of using different approximative distances, we decided to train the 6 layer PDE-G-CNN (with 2560 parameters) 10 times for 80 epochs using each distance approximation. The results can be found in Figs. 17 and 18. We see that on DCA1 all distance approximations have a comparable performance. We notice a small dent in effectiveness when using \(\rho _{b,sr}\), and a small increase when using \(\rho _{b,com}\).

Fig. 17
figure 17

A scatterplot showing how the use of different distance approximations affect the performance of the 6 layer PDE-G-CNN on the DCA1 dataset. The crosses indicate the mean

Fig. 18
figure 18

a Input, b \(\rho _c\), c \(\rho _b\), d Truth, e \(\rho _{b,sr}\), f \(\rho _{b,com}\). In Fig. 18a and d, we see one sample from the DCA1 dataset: a coronary angiogram together with the ground-truth segmentation. The other four pictures show the output of the 6 layer PDE-G-CNN, one for each distance approximation. The networks that were used in this figure have an accuracy approximately equal to the mean accuracy in Fig. 17

6.3 Lines

For the line completion problem, we created a dataset of 512 training images and 128 test images.Footnote 2 Fig. 21a and d shows one sample of the Lines dataset.

To establish a baseline, we ran a 6 layer CNN, G-CNN and PDE-G-CNN. For this baseline we again used \(\rho _{c}\) within the PDE-G-CNN, but changed the amount of channels to 30, and the kernel sizes to [9, 9, 9], making the total amount of parameters 6018. By increasing the kernel size, we anticipate that the difference in effectiveness of using the different distance approximations, if there is any, becomes more pronounced. Every network was trained 15 times for 60 epochs. The result of this baseline can be seen in Fig. 19. The amount of parameters of the networks can be found in Table 8. We again see that the PDE-G-CNN outperforms the G-CNN, which in turn outperforms the CNN, while having the least amount of parameters.

Fig. 19
figure 19

A scatterplot showing how a 6 layer CNN, G-CNN (both with \(\approx 25k\) parameters), and a PDE-G-CNN (with only 6k parameters) compare on the Lines dataset. The crosses indicate the mean. For the precise amount of parameters, see Table 8

We again test the effect of using different approximative distances by training the 6 layer PDE-G-CNN 15 times for 60 epochs for every approximation. The results can be found in Fig. 20. We see that on the Lines dataset, all distance approximations again have a comparable performance. We again notice an increase in effectiveness when using \(\rho _{b,com}\), just as on the DCA1 dataset. Interestingly, using \(\rho _{b,sr}\) does not seem to hurt the performance on the Lines dataset, which is in contrast with DCA1. This is in line with what one would expect in view of the existing sub-Riemannian line-perception models in neurogeometry. Furthermore, in Fig. 21b,c,e and f some feature maps of a trained PDE-G-CNN are visualized.

7 Conclusion

In this article, we have carefully analyzed how well the nonlinear erosion and dilation parts of PDE-G-CNNs are actually solved on the homogeneous space of 2D positions and orientations \(\mathbb {M}_2\). According to Proposition 1, the Hamilton–Jacobi equations are solved by morphological kernels that are functions of only the exact (sub)-Riemannian distance function. As a result, every approximation of the exact distance yields a corresponding approximative morphological kernel.

Table 8 The total amount of parameters in the networks that are used in Fig. 19
Fig. 20
figure 20

A scatterplot showing how the use of different distance approximations affect the performance of the 6 layer PDE-G-CNN on the Lines dataset. The crosses indicate the mean

Fig. 21
figure 21

a Input bd Truth ef. In 21a and d, we see one sample from the Lines dataset. The other four pictures are visualizations of feature maps of the 6 layer PDE-G-CNN. In Fig. 21b and e, we see a feature map of the lifting layer together with its max-projection over \(\theta \). In Fig. 21c and f, we see a feature map of the last PDE layer, just before the final projection layer

In Theorem 1, we use this to improve upon local and global approximations of the relative errors of the erosion and dilations kernels used in the papers [28, 60] where PDE-G-CNN are first proposed (and shown to outperform G-CNNs). Our new sharper estimates for distance on \(\mathbb {M}_2\) have bounds that explicitly depend on the metric tensor field coefficients. This allowed us to theoretically underpin the earlier worries expressed in [28, Fig. 10] that if spatial anisotropy becomes high the previous morphological kernel approximations [28] become less and less accurate.

Indeed, as we show qualitatively in Table 2 and quantitatively in Sect. 6.1, if the spatial anisotropy \(\zeta \) is high one must resort to sub-Riemannian approximations. Furthermore, we propose a single distance approximation \(\rho _{b,com}\) that works both for low and high spatial anisotropy.

Apart from how well the kernels approximate the PDEs, there is the issue of how well each of the distance approximations perform in applications within the PDE-G-CNNs. In practice, the analytic approximative kernels using \(\rho _b\), \(\rho _c\), \(\rho _{b,com}\) perform similarly. This is not surprising as our theoretical result Lemma 3 and Corollary 1 reveals that all morphological kernel approximations carry the correct 8 fundamental symmetries of the PDE. Nevertheless, Figs. 17 and 20 do reveal advantages of using the new kernel approximations (in particular \(\rho _{b,com}\)) over the previous kernel \(\rho _c\) in [28].

The experiments also show that the strictly sub-Riemannian distance approximation \(\rho _{b,sr}\) only performs well on applications where sub-Riemannian geometry really applies. For instance, as can be seen in Figs. 17 and 20, on the DCA1 dataset \(\rho _{b,sr}\) performs relatively poor, whereas on the Lines dataset, \(\rho _{b,sr}\) performs well. This is what one would expect in view of sub-Riemannian models and findings in cortical line-perception [37, 38, 40, 41, 46, 61] in neurogeometry.

Besides better accuracy and better performance of the approximative kernels, there is the issue of geometric interpretability. In G-CNNs and CNNs, geometric interpretability is absent, as they include ad-hoc nonlinearities like ReLUs. PDE-G-CNNs instead employ morphological convolutions with kernels that reflect association fields, as visualized in Fig. 5b. In Fig. 8, we see that as network depth increases association fields visually merge in the feature maps of PDE-G-CNNs toward adaptive line detectors, whereas such merging/grouping of association fields is not visible in normal CNNs.

In all cases, the PDE-G-CNNs still outperform G-CNNs and CNNs on the DCA1 dataset and Lines dataset: they have a higher (or equal) performance, while having a huge reduction in network complexity, even when using 3 layers. Regardless, the choice of kernel \(\rho _c\), \(\rho _b\), \(\rho _{b,sr}\), \(\rho _{b,com}\) the advantage of PDE-G-CNNs toward G-CNNs and CNNs is significant, as can be clearly observed in Figs. 16 and 19 and Table 7 and 8. This is in line with previous observations on other datasets [28].

Altogether, PDE-G-CNNs have better geometric reduction, performance, and geometric interpretation, than basic classical feed-forward (G)-CNN networks on various segmentation problems.

Extensive investigations on training data reduction, memory reduction (via U-Net versions of PDE-G-CNNs), and a topological description of the merging of association fields are beyond the scope of this article, and are left for future work.