On the finite representation of linear group equivariant operators via permutant measures

Recent advances in machine learning have highlighted the importance of using group equivariant non-expansive operators for building neural networks in a more transparent and interpretable way. An operator is called equivariant with respect to a group if the action of the group commutes with the operator. Group equivariant non-expansive operators can be seen as multi-level components that can be joined and connected in order to form neural networks by applying the operations of chaining, convex combination and direct product. In this paper we prove that each linear G-equivariant non-expansive operator (GENEO) can be produced by a weighted summation associated with a suitable permutant measure, provided that the group G transitively acts on a finite signal domain. This result is based on the Birkhoff–von Neumann decomposition of doubly stochastic matrices and some well known facts in group theory. Our theorem makes available a new method to build all linear GENEOs with respect to a transitively acting group in the finite setting. This work is part of the research devoted to develop a good mathematical theory of GENEOs, seen as relevant components in machine learning.


Introduction
The development of new mathematical approaches and results for deep learning is an important goal in the present time.In particular, new methods and theories are requested in order to understand and control the behavior of neural networks.In this line of research, the scientific community is devoting more and more attention to the use of equivariant operators, since they appear to be of great importance for the progress of machine learning [2,8,22].We recall that an operator is called equivariant with respect to a group if the action of the group commutes with the operator.For example, the operator associating each regular function f : R n → R to its Laplacian ∆f commutes with each Euclidean isometry of R n .The property of equivariance is of use when we want to mimic the behavior of observers and agents that are known to respect some symmetries.In fact, the use of equivariant operators allows us to inject pre-existing knowledge into the system, thus increasing our control on the costruction of neural networks [4].Furthermore, invariant and non-expansive operators can be used to reduce data variability [20,21], and in the last years equivariant transformations have been studied for learning symmetries [23,1].
Due to the relevance of group actions in deep learning, geometry is giving a significant contribution to the study of AI [6].Geometric Deep Learning is indeed trying to produce a geometric unification of several approaches to machine learning, focusing on the concepts of symmetry and invariance.At the intersection between this research field and Topological Data Analysis, it has been proposed to extend the study of the geometry of the space of data to the study of the geometry of the space of the observers/agents that elaborate the data [15,16].This idea is both natural and relevant, since the interpretation of data depends on the chosen observers, and the approximation of the agents requires the knowledge of the topological and geometric properties of the space such agents belong to, including connectivity, convexity, compactness, curvature and so on.
Recently, the study of group equivariant non-expansive operators (GENEOs) has been proposed in [5] as an interesting topic in machine learning, since these operators model the concept of data observer.Furthermore, they could be seen as multi-level components that can be joined and connected to form neural networks by applying the operations of chaining, convex combination and direct product, opening the path to a new kind of explainable "geometric agent engineering".Therefore, the analysis of the topology and geometry of the spaces of GENEOs could lead to new theoretical results for building neural networks in a more transparent and interpretable way.For example, the use of a suitable topological setting allowed us to prove the compactness of the space of all GENEOs, under the assumption that the space of data is compact [5].This result has relevant practical consequences, since it guarantees the finite approximability of the spaces of GENEOs, provided that the spaces of data are compact.Incidentally, we stress that the previously stated compactness does not hold for expansive equivariant operators, so justifying our interest in GENEOs.
The development of a good topological and geometric theory of the spaces of GENEOs could indeed produce new methods for approximating external agents in such spaces, suggest how to change such operators without losing their equivariance, benefit from their lattice structure with respect to the operations of maximization and minimization [18], and allow to manage relations and conflicts that can arise in intelligent structures [14], just to make a few examples.As for the link between GENEOs and Topological Data Analysis (with particular reference to persistent homology), we refer the interested reader to [16,5].A central role in this link is taken by the so-called natural pseudo-distance [10,11,12,17].
In summary, the use of GENEOs in machine learning requires to explore the global structure of the spaces these operators belong to, in order to fully benefit from their potential.In particular, this paper is devoted to study the global structure of the space of all linear G-equivariant non-expansive operators transitively acting on a finite signal domain X, with respect to the concept of permutant measure.
We already know that every linear G-equivariant operator (GEO) can be represented as a G-convolution, provided that the group G is compact and its action is transitive [19].Unfortunately, the computation of the integral representing such a convolution is usually not trivial since in many applications the group G is far from being small.In order to solve this problem, a new technique to build GEOs with respect to a group G has been recently proposed, based on the concept of permutant [7,9].In plain words, a permutant is a collection of automorphisms of the domain X of the signals we are interested in, under the assumption that such a collection is stable for the conjugation action of the group G.A permutant is not required to be a group.When a nonempty permutant H for G is available, we can define a GEO F for G by setting F (ϕ) := h∈H ϕh −1 for every admissible signal ϕ.The main benefit of this procedure is that the permutant H can be much smaller than the group G, so that the computation of the GEO defined by H can be much simpler than the computation of GEOs represented as G-convolutions.
In this paper we extend the definitions of permutant and GEO associated with a permutant by introducing the definitions of permutant measure and GEO associated with a permutant measure.If µ is a permutant measure on the group Aut(X) of all permutations on X, we can define a GEO F µ for G by setting F µ (ϕ) := h∈Aut(X) ϕh −1 µ(h) for every admissible signal ϕ.In our mathematical setting we can study the relationship between linear GEOs and GEOs associated with permutant measures.In particular, we can prove that these two concepts coincide, provided that the group G transitively acts on a finite signal domain X.This theorem is based on the Birkhoff-von Neumann decomposition of doubly stochastic matrices and some well known facts in group theory.Its statement makes available a new method to build all linear GEOs with respect to a transitively acting group in the finite setting.As a final step, we get our main result by adapting the previous theorem to the case of GENEOs, by taking into account the non-expansiveness condition.
We stress that this paper is not focused on direct applications of GENEOs, but on a piece of research functional to the development of a good general theory of these operators.Our long-term goal is the one of making available equivariant operators that are both easily computable and predictable in their behavior, so allowing for their use in geometric deep learning.
The outline of the paper is as follows.In Section 1 we recall the main definitions in our mathematical setting, and introduce the concept of permutant measure together with some of its properties.In Section 2 we prove our main result about GEOs (Theorem 2), which is adapted to GENEOs in Section 3 (Theorem 3).In Section 4 we illustrate a toy example, showing possible advantages of the use of GENEOs built by permutant measures.In Section 5 we conclude the paper with a brief discussion.

Mathematical setting
Let R X ∼ = R n be the vector space of all functions from a finite set X = {x 1 , . . ., x n } to R. We would like to recall that R X has the canonical basis {1 x j } j , where 1 x : X → R is the function taking the value 1 at x and the value 0 at every point y with y = x.We also consider the group Aut(X) of all permutations on X and a subgroup G of Aut(X).Aut(X) and G naturally act on R X by composition on the right.We endow R X with the L ∞ -norm: Remark 11.If we endow X with the discrete topology, R X coincides with C 0 (X, R).
In this paper we will use the multiplicative notation to denote the composition of functions, and the cycle notation to represent permutations.

Group equivariant operators
We give the following definition.Definition 11.A Group Equivariant Operator (GEO) for (R X , G) (with respect to the identity id G : g → g) is a function F : R X −→ R X such that F (ϕg) = F (ϕ)g, for all ϕ ∈ R X and g ∈ G.
An important subset of the set of GEOs is given by the set of Group Equivariant Non-Expansive Operators (GENEOs), i.e., GEOs F such that F (ϕ 1 ) − F (ϕ 2 ) ∞ ≤ ϕ 1 − ϕ 2 ∞ , for all ϕ 1 , ϕ 2 ∈ R X .In a more general framework, GEOs and GENEOs can be defined from (R X , G) to (R Y , H) with respect to a group homomorphism T : G → H, where equivariance means that F (ϕg) = F (ϕ)T (g), for all ϕ ∈ R X and g ∈ G.
For further information, we refer the reader to [5].Obviously, the set of GEOs from (R X , G) to (R X , G) is not empty because it contains at least the identity operator id R X : ϕ → ϕ.

Permutants and permutant measures
Definition 12.A finite signed measure µ on Aut(X) is called a permutant measure with respect to G if each subset H of Aut(X) is measurable and µ is invariant under the conjugation action of G (i.e., µ(H) = µ(gHg −1 ) for every g ∈ G).Equivalently, we can say that a signed measure µ on Aut(X) is a permutant measure with respect to G if each singleton {h} ⊆ Aut(X) is measurable and µ({h}) = µ({ghg −1 }) for every g ∈ G.
With a slight abuse of notation, we will denote by µ(h) the signed measure of the singleton {h} for each h ∈ Aut(X).
Example 11.Let us consider a positive integer number n and the finite set Let G be the group of all rotations of X of an angle α around the point (0, 0), with α multiple of 2π n .After fixing an integer number m, let us consider the map h ∈ Aut(X) that takes each point cos .Moreover, we define the function µ 1 : P(Aut(X)) → R that takes each subset C of Aut(X) to 1 if h ∈ C and to 0 if h / ∈ C, where P(Aut(X)) is the power set of Aut(X).Since the orbit of h under the conjugation action of G is the singleton { h}, the function µ 1 is a permutant measure.We also observe that while the cardinality of G is n, the cardinality of the support supp(µ 1 ) := {h ∈ Aut(X) : µ 1 (h) = 0} of the signed measure µ 1 is 1.
Example 12. Let us consider the set X of the vertices of a cube in R 3 , and the group G of the orientationpreserving isometries of R 3 that take X to X. Let π 1 , π 2 , π 3 be the three planes that contain the center of mass of X and are parallel to a face of the cube.Let h i : X → X be the orthogonal symmetry with respect to π i , for i ∈ {1, 2, 3}.We have that the set {h 1 , h 2 , h 3 } is an orbit under the conjugation action of G.We can now define a permutant measure µ 2 on the group Aut(X) by setting µ 2 (h 1 ) = µ 2 (h 2 ) = µ 2 (h 3 ) = c, where c is a positive real number, and µ 2 (h) = 0 for any h ∈ Aut(X) with h / ∈ {h 1 , h 2 , h 3 }.We also observe that while the cardinality of G is 24, the cardinality of the support supp(µ 2 ) := {h ∈ Aut(X) : µ 2 (h) = 0} of the signed measure µ 2 is 3.
Permutant measures give a simple method to build GEOs, as shown by the following result.
Proposition 11.If µ is a permutant measure with respect to G, then the map F µ : R X → R X defined by setting F µ (ϕ) := h∈Aut(X) ϕh −1 µ(h) is a linear GEO.
Proof.Since Aut(X) linearly acts on R X by composition on the right, F µ is linear.Moreover, for every ϕ ∈ R X and every g ∈ G ) and the map h → f := ghg −1 is a bijection from Aut(X) to Aut(X).
Example 13.The GEOs associated with the permutant measures defined in Examples 11 and 12 are respectively It is interesting to observe that the set PM(G) of permutant measures with respect to G is a lattice.Indeed, if µ 1 , µ 2 ∈ PM(G), then the measures µ , µ on Aut(X), respectively defined by setting µ (h) := min{µ 1 (h), µ 2 (h)} and µ (h) := max{µ 1 (h), µ 2 (h)}, still belong to PM(G).Moreover, if µ ∈ PM(G) then |µ| ∈ PM(G).Furthermore, PM(G) is closed under linear combination.Therefore, PM(G) has a natural structure of real vector space.We can compute the dimension of PM(G) by considering the conjugation action of G on Aut(X).
Proof.Consider a permutant measure µ on Aut(X).We define the function f µ : Aut(X) /G → R by setting f µ (O) = µ(h), where h ∈ O. Since µ is invariant under the conjugation action of G, f µ is well defined.One could easily check that the map µ → f µ is an isomorphism between PM(G) and the space R Aut(X) /G of all realvalued functions on Aut(X) /G.
Hence, the statement is proved.
Proposition 12 and the well-known Burnside's Lemma imply that dim PM(G) = 1 |G| g∈G |Aut(X) g |.We recall that Aut(X) g denotes the set of elements fixed by the action of g, i.e., Aut(X) g := {h ∈ Aut(X)|ghg −1 = h}.
Let us now recall the concept of permutant [7,9], which is related to the one of permutant measure.
Definition 13.We say that a subset Note that a subset H of Aut(X) is a permutant for G if and only if H is a union of orbits for the conjugation action of G on Aut(X).It follows that the number of permutants for G is equal to 2 | Aut(X)

| . Let us denote by
Perm(G) the set of all permutants for G. From Proposition 12 the next corollary immediately follows.
The following definition extends the one of versatile group (cf.[7]) and is of use in studying permutants.
Definition 14.If k is a positive integer, we say that the group G ⊆ Aut(X) is k-weakly versatile if for every pair (x, z) ∈ X × X with x = z and every subset S of X with |S| ≤ k, a g ∈ G exists such that g(x) = x and g(z) / ∈ S.
The previous definition allows us to highlight an interesting property of permutants.
Lemma 11.If G is k-weakly versatile, then every permutant H = ∅, {id X } has cardinality strictly greater than k.
Proof.By contradiction, let us assume that a non-empty permutant We stress that when the group G becomes larger and larger the lattice PM(G) becomes smaller and smaller.This duality implies that the method described by Proposition 11 is particularly interesting when G is large.In some sense, this duality is analogous to the one described in [16, Subsection 3.1].

Representation of linear GEOs via permutant measures
A natural question arises from Proposition 11: Which linear GEOs can be represented as GEOs associated with a permutant measure?
We can prove the following result.
Theorem 1.If G transitively acts on X, then for every linear group equivariant operator F for (R X , G) a permutant measure µ exists such that In order to prove this statement, let us consider the matrix B = (b ij ) associated with F with respect to the basis {1 x 1 , . . ., 1 x n }.Remark 21.We observe that 1 x h −1 = 1 h(x) for every h ∈ Aut(X) and every x ∈ X.
In the following, for every g ∈ G we will denote by σ g : {1, . . ., n} → {1, . . ., n} the function defined by setting σ g (j) = i if and only if g(x j ) = x i .We observe that σ g −1 = σ −1 g .We need the following lemmas.
Lemma 21.An n-tuple of real numbers α = (α 1 , . . ., α n ) exists such that each row and each column of B can be obtained by permuting α.
Proof.Let us choose a function 1 x j and a permutation g ∈ G.By equivariance we have that The left-hand side of the equation can be rewritten as: On the right-hand side we get b σ g (s)j 1 x s by setting x s = g −1 (x i ).Therefore, we obtain the following equation: This immediately implies that b iσ −1 g (j) = b σ g (i)j , for any i ∈ {1, . . ., n}.Since this equality holds for any j ∈ {1, . . ., n} and any g ∈ G, we have that b ij = b σ g (i)σ g (j) for every i, j ∈ {1, . . ., n} and every g ∈ G.
Now we are ready to show that all the rows of B are permutations of the first row, and all the columns are permutations of the first column.Since G is transitive, for every p, q ∈ {1, . . ., n} there exists g pq ∈ G such that g pq (x p ) = x q .Consider the ī-th row of B. We know that b īj = b σ g ī1 (ī)σ g ī1 (j) = b 1σ g ī1 (j) , for any j ∈ {1, . . ., n}.Since σ g ī1 is a permutation, the ī-th row is a permutation of the first row.By the same arguments, we can assert that every column of B is a permutation of the first column of B.
Let us now consider a real number y, and denote by r(y) (respectively s(y)) the number of times y occurs in each row (respectively column) of B. Both nr(y) and ns(y) represent the number of times y appears in B. Since nr(y) = ns(y), each row and column contains the same elements (counted with multiplicity).Hence, the statement of our lemma is proved.
The following result is well known [13].
Lemma 22 (Birkhoff-von Neumann decomposition).Let M be a n × n real matrix with non-negative entries, such that both the sum of the elements of each row and the sum of the elements of each column is equal to c. Then for every h ∈ Aut(X) a non-negative real number c(h) exists such that h∈Aut(X) c(h) = c and M = h∈Aut(X) c(h)P (h), where P (h) is the permutation matrix associated with h.
We recall that the permutation matrix associated with the permutation h : X → X is the n × n real matrix (p ij (h)) defined by setting p ij (h) = 1 if h(x j ) = x i and p ij (h) = 0 if h(x j ) = x i .Equivalently, we can define the permutation matrix associated with the permutation h : X → X as the n × n real matrix P (h) such that P (h)e j = e σ h (j) for every column vector e j := t (0, . . ., 1, . . ., 0) ∈ R n (where 1 is in the j-th position).
Remark 22.In general, the representation M = h∈Aut(X) c(h)P (h), stated in Lemma 22, is not unique.As an example, consider the set X = {1, 2, 3} and the group G = Aut(X).Let F : R X → R X be the linear application that maps 1 j to i∈X 1 i , for any j ∈ X.One could easily check that F is a linear GEO for (R X , G).Indeed, we have that F (1 j h) = F (1 j ) = F (1 j )h for any j ∈ X and any h ∈ Aut(X).The matrix B associated with F with respect to the basis {1 j } j is: One could represent B at least in two different ways: We proceed in our proof of Theorem 1 by taking the linear maps F ⊕ , F : R X → R X defined by setting x i for every index j ∈ {1, . . ., n}.We can easily check that 1. F ⊕ , F are linear GEOs; 2. The matrices associated with F ⊕ and F with respect to the basis {1 x 1 , . . ., ), respectively (in particular, B ⊕ , B are non-negative matrices); 3. F = F ⊕ − F and B = B ⊕ − B ; 4. Lemma 21 and the definitions of B ⊕ , B imply that two n-tuples of real numbers α ⊕ = (α ⊕ 1 , . . ., α ⊕ n ), α = (α 1 , . . ., α n ) exist such that each row and each column of B ⊕ can be obtained by permuting α ⊕ , and each row and each column of B can be obtained by permuting α .
From Property (4) and Lemma 22 this result follows: Proof.Let us start by considering the statement concerning c ⊕ (h) and F ⊕ (h).Without loss of generality, since F ⊕ is linear, it will suffice to prove the existence of a suitable non-negative function c ⊕ (h), such that , for any j ∈ {1, . . ., n}.The column coordinate vector of the function F ⊕ (1 x j ) relative to the basis {1 x 1 , . . ., 1 x n } is B ⊕ e j .Property (4) and Lemma 22 imply that for every h ∈ Aut(X) a non-negative real number c ⊕ (h) exists, such that Since the column vector e σ h (j) represents the column coordinate vector of the function 1 h(x j ) relative to the basis {1 x 1 , . . ., 1 x n }, we can conclude that The proof of the statement concerning c and F is analogous.
Remark 23.In general, the function c : Aut(X) → R associated with the Birkhoff-von Neumann decomposition does not induce a permutant measure, i.e., the function µ c that takes each subset H of Aut(X) to the value µ c (H) := h∈H c(h) is not a permutant measure.For example, let us consider the set X = {1, 2, 3, 4} and the group S 4 of all permutations of X.Let us define a linear GEO F : R X → R X for (R X , S 4 ) by setting F (1 j ) = i∈X 1 i , for every index j.After fixing the basis {1 j } j , the matrix B associated with F has the following form: As guaranteed by Lemma 22, B can be decomposed as follows: B = P (id X ) + P (σ) + P (σ 2 ) + P (σ 3 ), where σ = (1 2 3 4) ∈ S 4 , in cycle notation.Let σ be the cyclic group generated by σ.The function c : Aut(X) → R associated with the previous decomposition of B is defined as follows: Since c(σ 2 ) = 1, c is not invariant under the conjugation action of S 4 , and hence µ c is not a permutant measure.
Let us now go back to the proof of Theorem 1 and consider the functions c ⊕ , c : Aut(X) → R introduced in Corollary 21.In order to define the permutant measure µ on Aut(X) we will need the next lemma.
Proof.Let us consider a permutation g ∈ G.The function R g −1 : R X → R X , which maps ϕ to ϕg −1 , is a linear application.Furthermore, R g −1 (1 x j ) = 1 x j g −1 = 1 g(x j ) for every index j.Hence, the matrix N associated to R g −1 with respect to the basis {1 x 1 , . . ., 1 x n } verifies the equality N e j = e σ g (j) , so that N = P (g) (we set e j := t (0, . . ., 1, . . ., 0) ∈ R n , where 1 is in the j-th position).Since F ⊕ is a GEO, the equality An analogous lemma holds for the matrix B .Lemma 23 guarantees that P (g)B ⊕ P (g) −1 = B ⊕ for every g ∈ G. From this equality and Lemma 22 it follows that Therefore, for every index j we have that This means that Since F ⊕ is linear, it follows that for every ϕ ∈ R X .We observe that the permutations gh −1 g −1 in the previous summation are not guaranteed to be different from each other, for g varying in G and h varying in Aut(X).
For each h ∈ Aut(X), let us consider the orbit O(h) of h under the conjugation action of G on Aut(X), and set In other words, we define the measures µ ⊕ (h), µ (h) of each permutation h as the averages of the functions c ⊕ , c along the orbit of h under the conjugation action of G. Let G h be the stabilizer subgroup of G with respect to h, i.e., the subgroup of G containing the elements that fix h by conjugation.We recall that by conjugating h with respect to every element of G we obtain each element of the orbit O(h) exactly |G h | times, and the well-known relation |G h ||O(h)| = |G| (cf.[3]).Let us now set δ(f, h) = 1 if f and h belong to the same orbit under the conjugation action of G, and δ(f, h) = 0 otherwise.
We observe that the following properties hold for f, h ∈ Aut(X): Therefore, equality (2.4) implies The definition of µ ⊕ immediately implies that µ ⊕ (H) = µ ⊕ (gHg −1 ) for every g ∈ G and every subset H of Aut(X).In other words, µ ⊕ is a non-negative permutant measure with respect to G. Quite analogously, we can prove the equality F (ϕ) = f ∈Aut(X) ϕf −1 µ (f ), and that µ is a non-negative permutant measure with respect to G.As a result, the function µ := µ ⊕ − µ is a permutant measure and the equality ϕ ∞ .This statement is trivial if F ≡ 0, since in this case µ is the null measure.Hence we can assume that F is not the null map and B is not the null matrix.In order to proceed, we need the next statement.Proposition 21.If f 1 , f 2 ∈ Aut(X) and an index s ∈ {1, . . ., n} exists, such that f 1 (x s ) = f 2 (x s ) (i.e., σ f 1 (s) = σ f 2 (s)), then either c ⊕ (f 1 ) = 0, or c (f 2 ) = 0, or both.

Proof. By applying the equality F
Analogously, the inequality b σ f 2 (s)s ≥ c (f 2 ) holds.Therefore, Let us now set c := c ⊕ − c .Corollary 22 implies that |c(h)| = c ⊕ (h) + c (h) for every h ∈ Aut(X).The definitions of µ ⊕ and µ immediately imply that f ∈O By setting 1 X := n j=1 1 x j and recalling Corollary 21, we obtain . By recalling that any line in B is a permutation of the first row of B, we have that For every function ϕ ∈ R X we have that The simplest non-trivial example concerning the statement of Theorem 1 can be described as follows.Let X = {1, 2} and G = Aut(X) = {id X , (1 2)}.Let us consider the linear GEO F : R X → R X defined by setting F (1 1 ) := 1 1 − 1 2 and F (1 2 ) := 1 2 − 1 1 .By defining µ(id X ) := 1 and µ((1 2)) := −1, we get that µ is a permutant measure with respect to G and F (ϕ) = h∈Aut(X) ϕh −1 µ(h) for every ϕ ∈ R X .Furthermore, We now observe that the assumption that G transitively acts on X cannot be removed from Theorem 1.
Example 22.Let us consider the set X = {1, 2} and the group G = {id X } ⊆ Aut(X) = {id X , (1 2)}.Take the operator F : R X → R X defined by setting F (1 i ) = 1 1 for any i ∈ X.Although F is a linear GEO, there does not exist a permutant measure µ on Aut(X), such that By contradiction, let us assume that such a permutant measure µ exists.Then, Since {1 1 , 1 2 } is a basis for R X , the equalities µ(id X ) = 1 and µ((1 2)) = 0 must hold.It follows that This contradicts the assumption that F (1 2 ) = 1 1 .
However, the signed measure c is not a permutant measure, since the orbits under the conjugation action of G are the sets Following the proof of Theorem 1, we can get a permutant measure µ by computing an average on the orbits.In other words, we can set By making this choice, the equality F (ϕ) = g∈Aut(X) ϕh −1 µ(h) holds for every ϕ ∈ R X , i.e., F is the linear GEO associated with the permutant measure µ.
Proposition 11 and Theorem 1 immediately imply the following statement.
Theorem 2. Assume that G ⊆ Aut(X) transitively acts on the finite set X and F is a map from R X to R X .The map F is a linear group equivariant operator for (R X , G) if and only if a permutant measure µ exists such that F (ϕ) = h∈Aut(X) ϕh −1 µ(h) for every ϕ ∈ R X .

Representation of linear GENEOs via permutant measures
Our main result about the representation of linear GEOs can be adapted to GENEOs.
Theorem 3. Assume that G ⊆ Aut(X) transitively acts on the finite set X and F is a map from R X to R X .The map F is a linear group equivariant non-expansive operator for (R X , G) if and only if a permutant measure µ exists such that F (ϕ) = h∈Aut(X) ϕh −1 µ(h) for every ϕ ∈ R X , and h∈Aut(X) |µ(h)| ≤ 1.
Proof.If F is a linear group equivariant non-expansive operator for (R X , G), then Theorem 1 guarantees that in PM(G) a permutant measure µ exists, such that F (ϕ) = h∈Aut(X) ϕh −1 µ(h) for every ϕ ∈ R X , and Since F is non-expansive, the inequality h∈Aut(X) |µ(h)| ≤ 1 follows.This proves the first implication in our statement.
Let us now assume that a permutant measure µ exists such that for every ϕ ∈ R X , with h∈Aut(X) |µ(h)| ≤ 1.Then Proposition 11 states that F is a linear group equivariant operator for (R X , G).Moreover, This proves that F is non-expansive, and concludes the proof of the second implication in our statement.
4 How GENEOs based on permutant measures could be used for transforming data In this section we will illustrate an example of a possible application of GENEOs obtained via permutant measures.The framework is the one described in Example 12, with some extensions.We consider a subset X ⊆ R 3 made of points with integer coordinates, belonging to a cubic lattice and discretizing a cube C. Formally, X can be expressed as the product X = {1, . . ., n} 3 .We consider again the group G of orientationpreserving isometries that map X into X.As already stated in Example 12, let π 1 , π 2 , π 3 be the three planes that contain the center of mass of C and are parallel to a face of the cube C. Let h i : X → X be the orthogonal symmetry with respect to π i , for i ∈ {1, 2, 3}.Then Let us now introduce two new permutants.First, consider the six planes λ 1 , . . ., λ 6 each one containing a couple of edges of C that are symmetric with respect to the center of mass of C.Moreover, let i : X → X be the orthogonal simmetry with respect to λ i for i ∈ {1, 2, 3, 4, 5, 6}.It is easy to check that H 2 = { 1 , . . ., 6 } is a permutant for G. Lastly, it is trivial to verify that H 3 = {s}, where s denotes the central symmetry with respect to the center of mass of the cube, is another permutant for G.
Given the permutants H 1 , H 2 and H 3 , three permutant measures µ H 1 , µ H 2 , µ H 3 on Aut(X) can be defined in the following way: By definition, we have that Hence, in force of Theorem 3 we know that the operators defined as are linear GENEOs with respect to G. Now, let us introduce a classification problem that we will tackle with the help of these GENEOs.We consider functions on X that can be interpreted as 3d scans of playing dice.We denote each point x of X with its indexes in the grid: x = (i, j, k).All the functions ϕ that we will take into account, are such that ϕ(i, j, k) = 0 if none of the indexes belong to the set {1, n}.This means that the functions can be non-zero only on the outer surface of the cube, as we are interested only in the visible part of the die.We observe that the number of points of X that belong to the outer surface of the cube C is n 3 − (n − 2) 3 .Furthermore, we model the dots on the faces of a die as two-dimensional Gaussian spots: in particular, if we consider the coordinates (a, b) on a face of the cube and we center the dot at point c = (ā, b) ∈ {1, . . ., n} 2 , we obtain the following representation: We set the standard deviation σ = 1, and hence ϕ c is close to the Gaussian function when the distance between (a, b) and the center c is strictly greater than 3. Lastly, we can obtain every pattern that is displayed on a face by summing several of such functions with different centers c i = (ā i , bi ).Therefore, a face that shows m dots located at the points {c i } i∈{1,...,m} has the expression: We introduce the coefficients k i ∈ [0, 1] to model differences of intensity between the dots.These differences can be seen as noise caused by the scanning procedure.Therefore, in the end, a data function ϕ vanishes inside the cube and coincides with a function ϕ m for some m and some {c i } i∈{1,...,m} on each of the six faces of the cube.
In the following experimental part we fix n = 25, n 1 = 6, n 2 = 13 and n 3 = 20.We report some of the configurations that model the patterns of standard dice.The reader can easily guess the missing ones: -One dot: m = 1 and c 1 = (n 2 , n 2 ).
We will consider two classes of dice.The first one will contain exclusively dice in which the sum of opposite faces always equals seven: this is the class that contains the standard dice on the market.The second class is made up of dice where the sum of the number of dots on opposite faces is never equal to seven: these dice are fake since they do not exist on the market.The two classes are distinct with respect to the action of G, i.e., one die of the first class cannot be obtained from a die of the other through the action of a g ∈ G.Moreover, these two classes are distinct also with respect to the action of the group of all isometries of the cube (not only the ones that preserve the orientation).We generated a dataset of 10000 dice by means of the following procedure: dice computed at odd iterations belong to the first class, while the ones computed at even iterations belong to the second class.At each iteration, a random arrangement of the faces is obtained according to the selected class.Then, for each face, the appropriate function ϕ m is computed as the sum in (4.1) with coefficients k i drawn independently from the uniform distribution U ([0.6, 1]).Hence, we obtain a function ϕ : X → R. Lastly, a random number p in the set {1, 2, 3, 4, 5} is chosen, and for each index i ∈ {1, . . ., p} a line r i that is a symmetry axis of the cube C and is orthogonal to two of its faces is randomly chosen.These choices are made with respect to uniform probability distributions.Now, for each index i a of angle π 2 around r i is applied to the function ϕ, and hence we obtain a new function φ : X → R describing a die.This ensures that all the possible spatial configurations of a die with a face placed on a tabletop can be generated.We note that each function representing a die is characterized by n 3 − (n − 2) 3 = 3458 distinct values rather than the total n 3 = 15625 (this is due to the fact that the function vanishes inside the cube).Figure 1 shows two examples of functions generated by applying the previously described procedure.
In order to discriminate the data, we designed a pipeline that makes use of GENEOs, and then we compared it to the analogous pipeline that uses only the original data.The pipeline is composed as follows: 1.A GENEO F is obtained as a convex combination of F 1 , F 2 , F 3 with weights α 1 , α 2 , α 3 , and then F is applied to all the functions of the dataset.(Here we are using the property that the spaces of GENEOs are convex, provided that the spaces of data are convex [5].) 2. Each transformed function ψ : , storing the values of ψ at the points of X belonging to the outer surface of C.This allows to identify each die with a vector in R 3458 .To reduce the dimensionality of these data we apply the PCA (Principal Component Analysis), preserving only the first two PCs.
3. With the reduced data, a SVM (Support Vector Machine) classifier with quadratic kernel is trained in order to discriminate the two classes.
The pipeline that uses the original data skips the first step and feeds to the PCA directly the vectors associated with the original functions.
To compare the two methods, we randomly split the dataset into a training set of size 7000 and a test set of size 3000.Both the subsets have been sampled in order to maintain a balanced distribution of the two classes.This splitting was used just to evaluate SVM classifiers in the third step, whereas the PCA was computed using the whole dataset in both cases.
Figure 2 shows the results of dimensionality reduction for both kinds of data.It is clear that, for this specific choice, the outputs of the GENEO F provide a clearer representation of the data: points are indeed almost separated, whereas, with the original data a separation is far less evident.However, it must be noted that not all the GENEOs obtained as convex combination of F 1 , F 2 , F 3 provide such a result.Here the parameters α 1 , α 2 , α 3 have been repeatedly sampled from a Dirichlet distribution to select the ones giving the best results (α 1 = 0.318, α 2 = 0.551, α 3 = 0.131).
The plot also shows that in the right side picture it is reasonable to expect an almost perfect classification with a quadratic decision boundary.This fact justifies the third step of the pipeline.The decision boundaries learned by the SVM classifiers are shown in Figure 3, while Table 1 reports confusion matrices and accuracy scores for the two methods.
As we expected, the method employing GENEOs has an accuracy score of 0.955 on the test set, a value significantly higher than the one of the method without GENEOs, which is 0.728.This result holds even if we modify some of the hyperameters.For example, we can change the distribution of the coefficients k i .In particular, we can consider again a uniform distribution but with a smaller range containing 1 (i.e., [0.8, 1.0]).In this way we reduce the randomness of the data.Because of this, the plots of Figures 2 and 3 tend to show groups of points that are much more concentrated, and therefore both the methods perform better, even though there is still a gap between the two.Furthermore, if we change the number of PCs retained by the PCA, we observe similar results by keeping one to three PCs.Starting from four PCs onward the two methods perform almost equally well.Another possibility is to use a different kernel for the SVM classifiers.For example, one can use a Radial Basis Function (RBF) kernel to obtain more complex decision boundaries.Models capable of learning more complex decision boundaries reduce the gap of performances between the two approaches, nonetheless considering up to two PCs the use of GENEOs is still highly beneficial.A summary of these experiments is reported in Table 2. (a) Firts method (not using GENEOs) (b) Second method (using GENEOs) Fig. 3 Plots of SVM decision boundaries PCA results suggest that the simpler choice to classify the mapped points is to use a quadratic decision boundary.These figures show the boundaries learned by the SVM algorithm with a quadratic kernel for both methods.The right image shows that it is reasonable to expect better classification scores for the method involving GENEOs.Using GENEOs is always beneficial when combined with a quadratic kernel SMV and a PCA with number of PCs less than four.Otherwise, with a RBF kernel SVM, GENEOs are beneficial up to two or three PCs, whereas retaining more PCs the two methods tend to perform equally well.

Discussion
In our paper we have proved that all linear GEOs and GENEOs can be produced by means of a dual method based on the concept of permutant measure with respect to a group G, under the assumption that G transitively acts on a finite set.This method could be particularly useful when we have to deal with a large group G, as frequently happens in real applications.Summations on large groups can indeed present computational difficulties, while summations on the supports of permutant measures are often easier.The use of the set of all permutant measures also benefits of its lattice structure.The availability of the approach we have studied in this paper could be relevant for the application of GEOs and GENEOs as multi-level components in deep learning and make the construction of neural networks more transparent and interpretable, according to the mathematical framework proposed in [5].The next natural step in this line of research is the extension of our approach to topological groups.We plan to study this possible extension in a forthcoming paper.
In Section 4 we have also illustrated an example, showing how GENEOs built by permutant measures could be used in order to extract relevant properties from data.In our opinion, GENEOs of this kind could be of great use in machine learning, since they can inject information about the way data should be managed on the basis of prior knowledge.In the given example, this knowledge was represented by the clear relevance of symmetries in data concerning a cube, leading us to focus our attention on symmetry planes.We are aware of the toy nature of this example, nonetheless we believe that it is important for two main reasons: first, it shows that a small network of GENEOs can be more informative than its single components, secondly, it makes clear that GENEOs allow to obtain simple and explainable representations of data that are easier to process in a pipeline with other explainable methods.We know that this specific example could be managed in many other ways, without the drastic dimensionality reduction of PCA and with methods more complex than SVM classifiers.Despite this, we showed that GENEOs can improve the results by requiring only minimal information.In the near future, we plan to develop more extended applications of GENEOs built via permutant measures, employing real world data.
2πk n , sin 2πk n to the point cos 2π(k+m) n , sin 2π(k+m) n to our definition, c is not constant on the orbits O(σ) and O(σ 2 ).

Fig. 1
Fig.1Example plot of two generated functions The left image shows a view of a die belonging to the first class, i.e., a die where the sum of the number of dots on opposite faces is always 7, as happens for the dice on the market.The right image shows a view of a die belonging to the second class, i.e., a die where the sum of the number of dots on opposite faces is never equal to 7.These dice are not regular.It is possible to see that not all the dots are equally vivid: this is due to the presence of the coefficients k i .These values model the noise resulting from the scanning procedure.

Fig. 2
Fig. 2 Plots of PCA results This figure shows the results of PCA dimensionality reduction for both methods.Blue points are associated with functions belonging to the first class, while orange ones with functions belonging to the second class.A separation between the mapped points is clearly more evident in the right figure, which is the one relative to the method involving GENEOs.Hence GENEOs provide a simpler and more informative representation of the data.

Table 1 :
Confusion Matrices These tables report confusion matrices for both methods and for both train and test set.The accuracy score is considerably higher for the method which uses GENEOs both for training and test set.Since train and test accuracy scores are very near for both methods, we are confident that the methods perform equally well on unseen data, meaning that they are not overfitting.

Table 2 :
Accuracy scores for different hyperparameters This table shows a comparison between the accuracies of the two methods for different combinations of the hyperparameters.