1 Introduction

Topological data analysis (TDA) is an emerging research field that is revealing important in managing the deluge of data of the present digital world. The ability of describing and comparing how data are connected to each other in a topological sense is a key point for their efficient comparison [3]. Persistent topology and homology are relevant mathematical tools in TDA, and many researchers are investigating these concepts both from a theoretical and an applicative point of view [6]. Their approach is based on the fact that datasets can be often represented by real-valued continuous functions defined on a topological space X [2]. The theory of persistence analyzes the properties of these functions that “persist” in presence of noise. In particular, this analysis can be done by studying the evolution of the k-dimensional holes of the sub-level sets associated with those functions. This theory admits also an extension to the case of functions taking values in \(\mathbb {R}^m\) (cf., e.g., [4, 5]).

Recently, this line of research has been inserted in a theoretical framework that could be of use to establish a link between persistence theory and machine learning [10]. The main idea consists in looking at shape comparison as a problem concerning the approximation of a given observer instead of the approximation of data. In this setting each observer is seen as a collection of suitable operators acting on the family of functions that represents the set of possible data. These operators describe the way the information is elaborated by the observer, on the basis of the assumption that the observer is not entitled to choose the data but only the method to process them.

The operators we consider often refer to some kind of invariance. Invariance is an important property in shape analysis, and “approximating an observer” usually means to understand not only the way she/he looks at data, but also the equivalences she/he refers to in data comparison. For example, in character recognition the observer is interested in distinguishing the symbols 6 and 9, so that the invariance group should not contain rotations, while this is no more true if the observer is interested in comparing spiral shells.

In presence of an invariance group, the natural pseudo-metric can be used as a ground-truth for shape comparison. Let us consider a set \(\varPhi \) of continuous \(\mathbb {R}\)-valued functions defined on a topological space X and a subgroup G of the group \(\mathrm {Homeo}(X)\) of all self-homeomorphisms of X. We assume that the group G acts on \(\varPhi \) by composition on the right. Now we can define the natural pseudo-distance \(d_G\) on \(\varPhi \) by setting \(d_G(\varphi _1,\varphi _2)=\inf _{g\in G}\Vert \varphi _1-\varphi _2 \circ g\Vert _\infty \), where \(\Vert \cdot \Vert _\infty \) denotes the sup-norm. Roughly speaking, \(d_G\) is based on the attempt to find the best correspondence between two functions of \(\varPhi \). If \(d_G(\varphi _1,\varphi _2)\) is small, by definition there exists a homeomorphism \(g\in G\) such that \(\varphi _2\circ g\) is a good approximation of \(\varphi _1\) with respect to the sup-norm. If \(\varphi _1\) and \(\varphi _2\) describe the results of two measurements of X (e.g., two pictures, two CT scans, or two financial series), the fact that \(d_G(\varphi _1,\varphi _2)\) is small means that the considered measurements can be aligned well by the reparameterization expressed by a suitable homeomorphism g.

Unfortunately, \(d_G\) is usually difficult to compute. However, one can approximate the natural pseudo-distance by means of persistent homology and G-invariant non-expansive operators (GINOs).

We recall that persistent homology describes the k-dimensional holes (components, tunnels, voids, ... ) of the sub-level sets of a topological space X with respect to a given continuous function \(\varphi :X\rightarrow \mathbb {R}^m\). If \(m=1\), persistent homology is described by suitable collections of points called persistence diagrams [8]. These diagrams can be compared by a suitable metric \(d_{\mathrm {match}}\), called bottleneck (or matching) distance (see the appendix of this paper).

It is known that if we compute the classical bottleneck distance between persistence diagrams associated with the functions \(F(\varphi _1)\), \(F(\varphi _2)\) and let F vary in the set of all G-invariant non-expansive operators on the space \(\varPhi \), we obtain the same information given by the natural pseudo-distance \(d_G\) [11]. Therefore, the goal of approximating \(d_G\) naturally leads to the problem of approximating the space \(\mathcal {F}(\varPhi ,G)\) of all G-invariant non-expansive operators on \(\varPhi \). In [11] it has been proved that this space is compact, if we assume that \(\varPhi \) is compact. This guarantees that, in principle, \(\mathcal {F}(\varPhi ,G)\) can be approximated by a finite subset.

In order to proceed along this line of research we need general methods to build G-invariant non-expansive operators. According to the goal of realizing those methods, this paper is devoted to prove some new results about the algebra of GINOs.

Our work is organized as follows. In Sect. 2 we introduce our mathematical setting. In Sect. 3 we give some new results about G-invariant non-expansive operators: in particular, we show how we can produce new GINOs by composition, translation, weighted average, maximization and, more in general, by means of a 1-Lipschitzian function applied to pre-existing GINOs. A short appendix about persistent homology concludes the paper.

2 Our Mathematical Model

Let X be a non-empty compact metric space, triangulated by a finite (and hence compact) simplicial complex. We suppose that the k-th homology group of X is nontrivial. For \(k=0\) the homology group always verifies the last assumption. Since X could be embedded in a larger (finitely) triangulable space \(Y_k\) with nontrivial homology in degree k, and substituted with \(Y_k\), for \(k\ge 1\) the condition is not restrictive. Let us consider a subspace \(\varPhi \) of the topological space \(C^0(X,\mathbb {R})\) of all real-valued continuous functions from X, endowed with the topology induced by the sup-norm \(\Vert \cdot \Vert _{\infty }\). Since X is compact, the considered functions are uniformly continuous. We suppose that \(\varPhi \) contains at least the constant functions taking every finite value c with \(|c| \le \sup _{\varphi \in \varPhi }\Vert \varphi \Vert _{\infty }\). Each function in the space \(\varPhi \) will be called an admissible filtering function on X. The space \(\varPhi \) contains the functions that the observer considers as acceptable data. We also assume that a subgroup G of the group \(\mathrm {Homeo}(X)\) of all homeomorphisms from X onto X is given, and that if \(\varphi \in \varPhi \) and \(g \in G\), then \(\varphi \circ g \in \varPhi \). In other words, the group G acts on \(\varPhi \) by composition on the right. We do not require G to be a proper subgroup of \(\mathrm {Homeo}(X)\), so that the equality \(G=\mathrm {Homeo}(X)\) may hold. One can easily show that G is a topological group with respect to the topology of the uniform convergence, and that the right action of G on the set \(\varPhi \) is continuous.

Definition 1

Assume that a space \(\varPhi \subseteq C^0(X,\mathbb {R})\) and a group \(G\subseteq \mathrm {Homeo}(X)\) are given. Each function \(F :\varPhi \rightarrow \varPhi \) is called a G -invariant Non-expansive Operator (GINO) for the pair \((\varPhi ,G)\), if:

  1. 1.

    F is G-invariant: \(F(\varphi \circ g)=F(\varphi ) \circ g, \ \forall \varphi \in \varPhi , \ \forall g \in G \);

  2. 2.

    F is non-expansive: \(\left\| F(\varphi _1) - F(\varphi _2) \right\| _{\infty }\le \left\| \varphi _1 - \varphi _2 \right\| _{\infty }, \ \forall \varphi _1,\varphi _2 \in \varPhi \).

If \(\varPhi \) is the space of all normalized grayscale images represented as functions from \(\mathbb {R}^2\) to [0, 1] and G is the group of rigid motions of the plane, a simple example of operator \(F\in \mathcal {F}(\varPhi ,G)\) is given by the Gaussian blurring filter, i.e. the operator F taking \(\varphi \in \varPhi \) to the function \(\psi (x)=\frac{1}{2\pi \sigma ^2}\int _{\mathbb {R}^2}\varphi (y)e^{-\frac{\Vert x-y\Vert ^2}{2\sigma ^2}}\ dy\) (see Fig. 1).

Fig. 1.
figure 1

The Gaussian blurring filter as an example of G-invariant non-expansive operator for G equal to the group of rigid motions of the plane.

For another approach to G-invariant persistent homology we refer the interested reader to [9].

3 Some New Results on Group Invariant Non-expansive Operators

In this section we will prove some new results about the algebra of GINOs, showing how new GINOs can be build by using pre-existing GINOs. The simplest one is based on functional composition.

Proposition 1

If \(F_1,F_2\) are GINOs for \((\varPhi ,G)\), then \(F:= F_2 \circ F_1\) is a GINO for \((\varPhi ,G)\).

Proof

  1. 1.

    Since \(F_1,F_2\) are G-invariant, F is G-invariant:

    $$\begin{aligned} F(\varphi \circ g)= & {} (F_2 \circ F_1)(\varphi \circ g)=F_2(F_1(\varphi \circ g))\\= & {} F_2(F_1(\varphi )\circ g)=F_2(F_1(\varphi ))\circ g\\= & {} F(\varphi )\circ g \end{aligned}$$

    for every \(\varphi \in \varPhi \) and every \(g \in G\).

  2. 2.

    Since \(F_1,F_2\) are non-expansive, F is non-expansive:

    $$\begin{aligned} \Vert F(\varphi _1) - F(\varphi _2)\Vert _{\infty }= & {} \Vert (F_2 \circ F_1)(\varphi _1) - (F_2 \circ F_1)(\varphi _2)\Vert _{\infty }\\= & {} \Vert F_2(F_1(\varphi _1)) - F_2(F_1(\varphi _2))\Vert _{\infty }\\\le & {} \Vert F_1(\varphi _1) - F_1(\varphi _2)\Vert _{\infty }\\\le & {} \Vert \varphi _1 - \varphi _2 \Vert _{\infty }\\ \end{aligned}$$

    \(\forall \varphi _1,\varphi _2 \in \varPhi \).

    \(\square \)

Let \(F_1, \dots , F_n\) be GINOs for \((\varPhi ,G)\). We can consider the function

$$\max (F_1,\dots , F_n)(\varphi ):=[\max (F_1(\varphi ),\dots , F_n(\varphi ))] $$

from \(\varPhi \) to \(C^0(X,\mathbb {R})\), where \([\max (F_1(\varphi ),\dots , F_n(\varphi ))]\) is defined by setting

$$[\max (F_1(\varphi ),\dots , F_n(\varphi ))](x):=\max (F_1(\varphi )(x),\dots , F_n(\varphi )(x)).$$

Proposition 2

Let \(F_1, \dots , F_n\) be GINOs for \((\varPhi ,G)\).

If \(\max (F_1,\dots , F_n)(\varPhi )\subseteq \varPhi \), then \(\max (F_1,\dots , F_n)\) is a GINO for \((\varPhi ,G)\).

In order to proceed, we recall the proof of the following lemma (cf. [1]):

Lemma 1

For every \(u_1, \dots , u_n, v_1, \dots , v_n \in \mathbb {R}\) it holds that

$$|\max ( u_1, \dots , u_n) - \max ( v_1, \dots , v_n)|\le \max ( |u_1 - v_1|, \dots , |u_n - v_n|).$$

Proof

Without loss of generality we can suppose that \(\max ( u_1, \dots , u_n) = u_1\). If \(\max ( v_1, \dots , v_n) = v_1\) the claim trivially follows. It only remains to check the case \(\max ( v_1, \dots , v_n) = v_i, \ i \ne 1\). We have that

$$\begin{aligned} \max ( u_1, \dots , u_n) - \max ( v_1, \dots , v_n)= & {} u_1 - v_i\\\le & {} u_1 -v_1\\\le & {} |u_1 - v_1| \\\le & {} \max ( |u_1 - v_1|, \dots , |u_n - v_n|). \end{aligned}$$

Similarly, we obtain

$$\begin{aligned} \max ( v_1, \dots , v_n) - \max ( u_1, \dots , u_n)= & {} v_i - u_1\\\le & {} v_i - u_i\\\le & {} |u_i - v_i|\\\le & {} \max ( |u_1 - v_1|, \dots , |u_n - v_n|). \end{aligned}$$

This proves the statement.     \(\square \)

Now we can prove the Proposition 2:

Proof

  1. 1.

    Since \(F_1, \dots , F_n\) are G-invariant, \(\max (F_1,\dots , F_n)\) is G-invariant:

    $$\begin{aligned} \max (F_1,\dots , F_n)(\varphi \circ g)= & {} [\max (F_1(\varphi \circ g), \dots , F_n(\varphi \circ g)]\\= & {} [\max (F_1(\varphi ) \circ g, \dots , F_n(\varphi ) \circ g)]\\= & {} [\max (F_1(\varphi ), F_n(\varphi ))] \circ g\\= & {} \max (F_1,\dots , F_n)(\varphi ) \circ g \end{aligned}$$

    for every \(\varphi \in \varPhi \) and every \(g \in G\).

  2. 2.

    Lemma 1 and non-expansivity of \(F_1, \dots , F_n\) imply that \(\forall x \in X\) and \(\forall \varphi _1, \varphi _2 \in \varPhi \):

    $$\begin{aligned}&|\max (F_1(\varphi _1(x)), \dots , F_n(\varphi _1(x))) - \max (F_1(\varphi _2(x)), \dots , F_n(\varphi _2(x)))| \\\le & {} \max (|F_1(\varphi _1(x)) - F_1(\varphi _2(x))|, \dots , |F_n(\varphi _1(x)) - F_n(\varphi _2(x))|)\\\le & {} \max (\Vert F_1(\varphi _1) - F_1(\varphi _2)\Vert _{\infty }, \dots , \Vert F_n(\varphi _1) - F_n(\varphi _2)\Vert _{\infty })\\\le & {} \max (\Vert \varphi _1 - \varphi _2\Vert _{\infty }, \Vert \varphi _1 - \varphi _2\Vert _{\infty }, \dots , \Vert \varphi _1 - \varphi _2\Vert _{\infty })\\= & {} \Vert \varphi _1 -\varphi _2\Vert _{\infty }. \end{aligned}$$

    Since it holds for every \(x \in X\), we obtain that \(\Vert \max (F_1,\dots , F_n)(\varphi _1) - \max (F_1,\dots , F_n)(\varphi _2)\Vert _{\infty } \le \Vert \varphi _1 -\varphi _2\Vert _{\infty }.\)

    \(\square \)

Let F be a GINO for \((\varPhi ,G)\) and \(b \in \mathbb {R}\). We can consider the function

$$F_b(\varphi ):=F(\varphi ) - b$$

from \(\varPhi \) to \(C^0(X,\mathbb {R})\).

Proposition 3

Assume that F is a GINO for \((\varPhi , G)\) and \(b \in \mathbb {R}\). If \(F_b(\varPhi ) \subseteq \varPhi \) then the operator \(F_b\) is a GINO for \((\varPhi , G)\).

Proof

  1. 1.

    Since F is G-invariant, \(F_b\) is G-invariant too:

    $$F_b(\varphi \circ g)= F(\varphi \circ g) -b = F(\varphi ) \circ g - b = F_b(\varphi ) \circ g $$

    for every \(\varphi \in \varPhi \) and every \(g \in G\).

  2. 2.

    Since F is non-expansive, \(F_b\) is non-expansive too:

    $$\begin{aligned} \Vert F_b(\varphi _1) - F_b(\varphi _2) \Vert _{\infty }= & {} \Vert F(\varphi _1) - b - (F(\varphi _2) - b) \Vert _{\infty }\\= & {} \Vert F(\varphi _1) - F(\varphi _2) \Vert _{\infty }\\\le & {} \Vert \varphi _1 - \varphi _2 \Vert _{\infty } \end{aligned}$$

    for every \(\varphi _1, \varphi _2 \in \varPhi \).

    \(\square \)

Let \(F_1, \dots , F_n\) be GINOs for \((\varPhi ,G)\) and \((a_1, \dots , a_n) \in \mathbb {R}^n\) with \(\sum _{i=1}^n |a_i| \le 1\). We can consider the function

$$F_{\varSigma }(\varphi ):= \sum _{i=1}^n a_i F_i(\varphi )$$

from \(\varPhi \) to \(C^0(X,\mathbb {R})\).

Proposition 4

Assume that \(F_1, \dots , F_n\) are GINOs for \((\varPhi ,G)\) and \((a_1, \dots , a_n) \in \mathbb {R}^n\) with \(\sum _{i=1}^n |a_i| \le 1\). If \(F_{\varSigma }(\varPhi )\subseteq \varPhi \), then \(F_{\varSigma }\) is a GINO for \((\varPhi , G)\).

Proof

  1. 1.

    \(F_{\varSigma }\) is G-invariant, because \(F_1, \dots , F_n\) are G-invariant:

    $$F_{\varSigma }(\varphi \circ g)= \sum _{i=1}^n a_i F_i(\varphi \circ g)= \sum _{i=1}^n a_i (F_i(\varphi )\circ g)= F_{\varSigma }(\varphi ) \circ g$$

    for every \(\varphi \in \varPhi \) and for every \(g \in G\).

  2. 2.

    Since \(F_1, \dots , F_n\) are non-expansive and \(\sum _{i=1}^n |a_i| \le 1\), \(F_{\varSigma }\) is non-expansive:

    $$\begin{aligned} \Vert F_{\varSigma }(\varphi _1) - F_{\varSigma }(\varphi _2)\Vert _{\infty }= & {} \left\| \sum _{i=1}^n a_i F_i(\varphi _1) - \sum _{i=1}^n a_i F_i(\varphi _2) \right\| _{\infty }\\= & {} \left\| \sum _{i=1}^n a_i (F_i(\varphi _1) - F_i(\varphi _2))\right\| _{\infty } \\\le & {} \sum _{i=1}^n |a_i| \left\| (F_i(\varphi _1) - F_i(\varphi _2))\right\| _{\infty } \\\le & {} \sum _{i=1}^n |a_i| \left\| \varphi _1 - \varphi _2 \Vert _{\infty }\le \Vert \varphi _1 - \varphi _2\right\| _{\infty } \end{aligned}$$

    for every \(\varphi _1, \varphi _2 \in \varPhi \).

    \(\square \)

The last three results are generalized by the next one. Let \(F_1, \dots , F_n\) be GINOs for \((\varPhi ,G)\) and L be a 1-Lipschitzian map from \(\mathbb {R}^n\) to \(\mathbb {R}\), where \(\mathbb {R}^n\) is endowed with the usual norm \(\Vert (x_1, \dots , x_n)\Vert _{\infty }=\max _{1\le i \le n}|x_i|\). Now we consider the function

$$L^*(F_1,\dots , F_n)(\varphi ):=[L(F_1(\varphi ),\dots , F_n(\varphi ))] $$

from \(\varPhi \) to \(C^0(X,\mathbb {R})\), where \([L(F_1(\varphi ),\dots , F_n(\varphi ))]\) is defined by setting

$$[L(F_1(\varphi ),\dots , F_n(\varphi ))](x):=L(F_1(\varphi )(x),\dots , F_n(\varphi )(x)).$$

Proposition 5

Assume that \(F_1, \dots , F_n\) are GINOs for \((\varPhi , G)\) and L is a 1-Lipschitzian map from \(\mathbb {R}^n\) to \(\mathbb {R}\). If \(L^*(F_1,\dots , F_n)(\varPhi ) \subseteq \varPhi \), then \(L^*(F_1,\dots , F_n)\) is a GINO for \((\varPhi , G)\).

Proof

  1. 1.

    The G-invariance of \(F_1, \dots , F_n\) implies that \(L^*(F_1,\dots , F_n)\) is G-invariant:

    $$\begin{aligned} L^*(F_1,\dots , F_n)(\varphi \circ g)= & {} [L (F_1(\varphi \circ g), \dots , F_n(\varphi \circ g))]\\= & {} [L (F_1(\varphi ) \circ g, \dots , F_n(\varphi ) \circ g)]\\= & {} [L (F_1(\varphi ), \dots , F_n(\varphi ))] \circ g\\= & {} L^*(F_1,\dots , F_n)(\varphi ) \circ g \end{aligned}$$

    for every \(\varphi \in \varPhi \) and every \(g \in G\).

  2. 2.

    Since \(F_1, \dots , F_n\) are non-expansive and L is 1-Lipschitzian, for every \(x \in X\) and every \(\varphi _1, \varphi _2 \in \varPhi \) we have that

    $$\begin{aligned}&|L(F_1(\varphi _1)(x),\dots , F_n(\varphi _1)(x)) - L(F_1(\varphi _2)(x),\dots , F_n(\varphi _2)(x))|\\\le & {} \Vert ( F_1(\varphi _1(x)) - F_1(\varphi _2(x)), \dots , F_n(\varphi _1(x)) - F_n(\varphi _2(x)))\Vert _{\infty }\\= & {} \max _{1 \le i \le n} |F_i(\varphi _1(x)) - F_i(\varphi _2(x))|\\\le & {} \max _{1 \le i \le n} \Vert F_i(\varphi _1) - F_i(\varphi _2)\Vert _{\infty } \\\le & {} \Vert \varphi _1 - \varphi _2\Vert _{\infty }. \end{aligned}$$

    In conclusion,

    $$\Vert L^*(F_1,\dots , F_n)(\varphi _1) - L^*(F_1,\dots , F_n)(\varphi _2)\Vert _{\infty } \le \Vert \varphi _1 - \varphi _2 \Vert _{\infty } .$$

    Therefore \( L^*(F_1,\dots , F_n)\) is non-expansive.

    \(\square \)

4 Conclusions

In this paper we have illustrated some new methods to build new G-invariant non-expansive operators from pre-existing ones. The ability of doing that is important to produce large sets of GINOs, in order to get good approximations of the topological space \(\mathcal {F}(\varPhi ,G)\) and hence good approximations of the natural pseudo-distance \(d_G\). The approximation of \(\mathcal {F}(\varPhi ,G)\) can be seen as an approximation of the considered observer, represented as a collection of invariant operators.

Fig. 2.
figure 2

GIPHOD asks the user to choose an invariance group in a list and a query image in a dataset.

Fig. 3.
figure 3

GIPHOD provides ten images that are judged to be the most similar to the proposed query image with respect to the chosen invariance group.

In order to show the use of the approach based on GINOs, a simple demonstrator has been realized, illustrating how this technique could make available new methods for image comparison. The demonstrator is named GIPHOD–G -Invariant Persistent HOmology Demonstrator and is available at the web page http://giphod.ii.uj.edu.pl/index2/ (joint work with Grzegorz Jabłoński and Marc Ethier). The user is asked to choose an invariance group in a list and a query image in a dataset \(\varPhi ^*\) of quite simple synthetic images obtained by adding a small number of bell-like functions (see Fig. 2). After that, GIPHOD provides ten images that are judged to be the most similar to the proposed query image with respect to the chosen invariance group (see Fig. 3). In this case study, the dataset \(\varPhi ^*\) is a subset of the set \(\varPhi \) of all continuous functions from the square \([0,1] \times [0, 1]\) to the interval [0, 1]. Each function represents a grayscale image on the square \([0,1]\times [0,1]\) (1=white, 0=black). GIPHOD uses a collection of GINOs for each invariance group G and tries to approximate \(d_G\) by means of the previously described technique, based on computing the persistent homology of the functions \(F(\varphi )\) for \(\varphi \in \varPhi \) and F varying in our set of operators.

Many questions remain open. In particular, the extension of our approach to operators taking the pair \((\varPhi ,G)\) into a different pair \((\varPsi ,H)\) should be studied. We are planning to do that in a forthcoming paper.