Homogeneous vector bundles and $G$-equivariant convolutional neural networks

$G$-equivariant convolutional neural networks (GCNNs) is a geometric deep learning model for data defined on a homogeneous $G$-space $\mathcal{M}$. GCNNs are designed to respect the global symmetry in $\mathcal{M}$, thereby facilitating learning. In this paper, we analyze GCNNs on homogeneous spaces $\mathcal{M} = G/K$ in the case of unimodular Lie groups $G$ and compact subgroups $K \leq G$. We demonstrate that homogeneous vector bundles is the natural setting for GCNNs. We also use reproducing kernel Hilbert spaces to obtain a precise criterion for expressing $G$-equivariant layers as convolutional layers. This criterion is then rephrased as a bandwidth criterion, leading to even stronger results for some groups.


Introduction
Developments in deep learning have increased dramatically in recent years. Even though multilayer perceptrons [2] and other general-architecture models work well for some tasks, achieving higher levels of performance often requires models that are more tailored to each application, and which incorporate some level of understanding of the data. Geometric deep learning [5,6,7,13,38] is the approach of using inherent geometric structure in data, and symmetry derived from geometry, to improve deep learning models.
Convolutional neural networks (CNNs) are among the simplest and most broadly applicable general-architecture models. They have been successfully applied to image classification and segmentation [41,52,53], text summarization [42], pose estimation [37], sign language recognition [27], and many other tasks. One reason why CNNs are so useful is that convolutional layers, the basic building blocks of CNNs, commute with the translation operator in Z 2 ; convolutional layers are translation equivariant. In image classification tasks, for instance, Z 2 represents the underlying pixel lattice, and translation equivariance helps CNNs identify objects in images regardless of their exact pixel coordinates. As convolutional layers respect the global translation symmetry in Z 2 , CNNs are examples of geometric deep learning models.
G-equivariant convolutional neural networks (GCNNs) [10,12] are generalizations of CNNs to data points defined on homogeneous G-spaces M. Convolutional layers that commute with the action G × M → M of the global symmetry group G, remove the need for GCNNs to learn about the global symmetry. It is already built into the network. This enables GCNNs to focus on learning other relevant features in data, potentially improving performance. One example is the detection of tumors in digital pathology. Images of tumors can have any orientation, and GCNNs with both translation and rotation equivariant layers have higher accuracy than ordinary CNNs [46]. Rotation equivariance is also highly useful in 3D inference problems [49], in point cloud recognition [35], and in other tasks.
Gauge equivariant neural networks [9,13,18,36] are instead designed to respect local symmetries. For example, computations involving vector fields -in meteorology or other areas -require vectors to be expressed in components. This requires a frame; a smooth assignment of a basis to each tangent space. However, the sphere and other non-parallelizable manifolds do not admit a global frame, so the computations must be performed locally, using different local frames for different regions on the manifold. It is then important that any numerical results obtained in one frame are compatible with those obtained in any other frame on overlapping regions. In other words, the computations should be equivariant with respect to the choice of local frame, which is a viewed as a gauge degree of freedom; a local symmetry. Gauge equivariant neural networks have also been introduced for problems exhibiting other local symmetries, primarily in lattice gauge theory.
In this paper, we study the mathematical foundations of GCNNs and characterize convolutional layers in terms of more abstract layers. Our contributions are threefold: • We analyze a general framework that include both gauge equivariant neural networks and GCNNs, that only differ in whether layers respect a local gauge symmetry or a global translation symmetry. Moreover, we show that GCNNs are naturally expressed in terms of homogeneous vector bundles. • In general, not all G-equivariant layers can be written as convolutional layers.
We investigate the relation between these types of layers for all homogeneous spaces M = G/K when G is a unimodular Lie group and K ≤ G is a compact subgroup. As a result of this investigation, we find a criterion for expressing G-equivariant layers as convolutional layers (Theorem 14). • We highlight the close relationship between convolutional layers in GCNNs, reproducing kernel Hilbert spaces (RKHS), and bandwidth. We reformulate the criterion in Theorem 14 as a bandwidth criterion and prove that, when G is discrete abelian or finite, 1 all G-equivariant layers are indeed convolutional layers (Corollaries [19][20][21]. This work was inspired by a number of papers [9,10,11,12,13,48]. The theoretical papers [9,12] have been of particular importance, as our work grew from a desire to understand the mathematics of equivariant neural networks in even greater detail. In the case of compact groups G, the Peter-Weyl theorem and other powerful tools have allowed researchers to study GCNNs using harmonic analysis. Among the most well-known results in this direction is Theorem 1 in [28], which uses Fourier analysis on G to establish that the layers in a G-equivariant feed-forward neural network must be generalized convolutional layers, when G is compact. This result is similar to our second contribution above and we discuss the distinction in Section 3.4. Others have used the well-known representation theory of the compact group G = SO(3) to study rotation equivariant GCNNs for spherical data [15,16,17].
The paper is structured as follows. We summarize the relevant machine learning background in Section 2.1, and discuss a framework for equivariant neural networks in Section 2.2. In Section 3, we restrict attention to homogeneous spaces G/K where G is a unimodular Lie group and K ≤ G is a compact subgroup. Section 3.2 explains the relation between GCNNs, homogeneous vector bundles, and induced representations. This relation is used to motivate the definition of G-equivariant layers in Section 3.4, where we also discuss convolutional layers and prove the aforementioned Theorem 14; this result characterizes when a G-equivariant layer is a convolutional layer, in terms of RKHS. Section 3.5 then relates RKHS to bandwidth, leading to a reformulation of Theorem 14 (Corollary 19) as well as a few stronger results. Finally, in Section 4, we summarize our work and end with a discussion.

Foundations of equivariant neural networks
In this section, we give an introduction to convolutional neural networks (CNNs) and discuss a simple framework for equivariant neural networks.
2.1. Convolutional neural networks. CNNs were first introduced in 1979 under the name of Neocognitrons, and were used to study visual pattern recognition [21]. In the 1990s, CNNs were successfully applied to problems such as automatic recognition of handwritten digits [33] and face recognition [32]. However, it was arguably not until 2012, when the GPU-based AlexNet CNN outperformed all competition on the ImageNet Large Scale Visual Recognition Challenge [30], that CNNs and other neural networks truly caught the public eye. Industrial work and academic research on deep learning has since soared, and current state-of-the-art deep learning architectures are significantly more powerful and more complex than AlexNet. Yet, convolutional layers remain important components.
In this introduction, we focus on data that can be represented by finitely supported functions f : Z 2 → R m .
(2.1) Digital images, for example, are of this form since each pixel x ∈ Z 2 is associated with a color array f (x) ∈ R m , and finite support is analogous to finite image resolution. Note that m = 1 corresponds to grayscale images and m = 3 to RGB images, but we allow any number of channels m. In general, any data represented by a finite 2D (m = 1) or 3D (m > 1) array with real-valued entries is of the form (2.1).
Convolutional layers act on data points (2.1) by 2 given a matrix-valued kernel κ : Z 2 → Hom(R m , R n ) for some n ∈ N. The kernel is also finitely supported in practice, so the maps κ ⋆ f : Z 2 → R n are themselves data points (2.1) with n channels. Broadly speaking, CNNs consist of convolutional layers (2.2) combined with other transformations, such as non-linear activation functions and batch normalization layers. We are mainly interested in convolutional layers, so we do not go into detail about non-linear activation functions or other types of layers. For more extensive descriptions of CNNs, see [1,24,51]. In image classification tasks, for instance, CNNs categorize digital images into a predefined number k of distinct classes, based on what the images depict. The CNN maps each digital image f : Z 2 → R m to a probability vector in R k estimating the probability that f belongs to any given class. During training, this probability vector is compared to the correct answer (which is known) and the discrepancy is computed using a loss norm or distance function. A gradient descent-based algorithm minimizes the loss function, thereby learning the kernel matrix elements and any other trainable network parameters. The result of this training procedure is a CNN that accurately classifies images in the training data set. Finally, the predictive power of the CNN is evaluated by using it to classify images from a test data set; images that were not used during training and which the CNN has not encountered before.
CNNs perform very well on image classification and similar machine learning tasks, and are important parts of many state-of-the-art network architectures on such tasks [4,25,44,50]. One reason for their success is translation equivariance: Convolutional layers (2.2) commute with the translation operator in the image plane, Translation equivariance makes CNNs agnostic to the specific locations of individual pixels, while still taking into account the relative positions of different pixels; images are more easily classified based on relevant features of their subjects, and not based on technical artifacts such as specific pixel coordinates. This observation motivates the introduction of more general convolutional layers that act equivariantly on data points f : M → V , where the domain M is homogeneous with respect to a locally compact group G [10,12]. Given finite-dimensional vector spaces V, W , convolutional layers are defined as certain vector-valued integrals 3 with operator-valued kernels κ : G → Hom(V, W ).
Remark 1. In (2.4), we integrate with respect to a Haar measure on the unimodular Lie group G.
Broadly speaking, G-equivariant convolutional neural networks (GCNNs) consist of sequences of convolutional layers (2.4) mixed with non-linear activation functions, and possibly other layers that are equivariant with respect to the global symmetry. This characterization is intentionally vague as we want to avoid making unnecessarily restrictive assumptions on the layers. For this reason, we will not study GCNNs from a holistic perspective, as a sequence of multiple layers, but instead focus on individual layers. We give a formal definition of abstract, G-equivariant layers in Definition 9, before defining the more specific convolutional layers in Definition 10.

2.2.
Gauge theory and the equivariant framework. Before going into detail about GCNNs in Section 3, let us describe a mathematical framework for equivariant neural networks. The framework is based on gauge theoretic concepts but is equally suitable for GCNNs. Both gauge equivariant neural networks and GCNNs will thus be described by this framework, their main difference being the specific equivariance properties imposed on layers.
Remark 2. This framework is already being used in GCNNs and gauge equivariant neural networks separately [9,12]. We are simply presenting the unified theory that includes both types of equivariance as separate cases.
Gauge theory originated in physics as a way to model local symmetry. In quantum electrodynamics (QED), for example, the electron wave function can be locally phase shifted, ψ → e iα ψ, with no physically observable consequence, and so QED is said to possess a U (1) gauge symmetry. Mathematicians have later adopted gauge theory in order to study other types of local symmetries. The introduction of gauge equivariant deep learning models has been suggested by deep learning practitioners and physicists alike. For example, [9] investigates the structure of gauge equivariant layers used for vector fields, tensor fields, and more general fields. Physicists have introduced gauge equivariant neural networks for applications in, e.g., lattice gauge theory [3,18,36].
We assume some familiarity with fiber bundles, 4 but we still present a few relevant definitions and examples. Definition 1. Let K be a Lie group. A smooth fiber bundle π : P → M is called a principal K-bundle with structure group K if there is a free, smooth right K-action with the following properties for each x ∈ M.
(i) Let P x = π −1 ({x}) be the fiber at x. Then That is, the K-action preserves fibers. (ii) For each p ∈ P x , the mapping k → p ⊳ k is a diffeomorphism K → P x .
Principal bundles are natural tools for understanding local symmetries, i.e., gauge degrees of freedom. In theoretical physics, gauge degrees of freedom are redundancies in the mathematical theory with no physical relevance. This is both a blessing and a curse: Solving the Yang-Mills equations of motion as an initial value problem, for example, is an underdetermined problem that cannot be solved without taking the gauge degrees of freedom into account; without choosing a gauge [39]. This is similar to our example in the introduction, that computations involving vector fields may require a choice of basis in each tangent space, even if this choice is irrelevant for the underlying application. On the other hand, problems may also become easier to solve by choosing a gauge with some finesse. 5 Definition 2. Let π : P → M be a principal K-bundle and assume U ⊆ M is open.
(i) A gauge is a local section ω : U → P .
We will go into more detail about the vector field example in Example 1. However, we first need to define associated bundles. To this end, let π : P → M be a principal K-bundle and let ρ : K → GL(V ρ ) be a finite-dimensional representation. Define an equivalence relation ∼ on P × V ρ by Let P × ρ V ρ = (P ×V ρ )/ ∼ denote the quotient space, whose elements are equivalence classes and consider the projection π ρ : Observe that each fiber π −1 ρ ({x}) has a natural vector space structure such that the mapping is a linear isomorphism for each fixed p ∈ P x .
Moreover, (2.11) defines a local frame ω : U → F M that sends each point x ∈ U to its coordinate basis in T x M. Local frames are sections of the frame bundle F M, which is a principal GL(d, R)-bundle, hence local frames are examples of gauges. Fix x ∈ U and expand X x ∈ T x M in the coordinate basis: By decomposing (2.12) into components and basis vectors, we can view the tangent vector X x as one giant tuple Another choice of coordinate chart produces another local frame ω ′ : U ′ → F M and another decomposition (2.12), assuming that x ∈ U ′ . These decompositions are related by a change of basis (2.14) for some B(x) ∈ GL(d, R). Now observe that (2.14) is of the form . The basis-dependent description (2.12) of X x thus resemble the pairs (p, v) in the construction of associated bundles. Passing to the quotient (F M × ρ R d )/ ∼ instead gives a basis-independent description of X x , since it identifies all possible decompositions (2.12) in all possible bases. That is, the tangent bundle is isomorphic to F M × ρ R d .
Equivariant neural networks use the language of principal and associated bundles. In the remainder of this subsection, let E ρ = P × ρ V ρ and E σ = P × σ V σ be associated bundles, given a principal bundle π : P → M over a smooth manifold M. Further let Γ c (E ρ ) and Γ c (E σ ) be the vector spaces of compactly supported continuous sections of E ρ and E σ , respectively.

Remark 3.
Even though data is typically real-valued, we primarily consider complex representations (ρ, V ρ ) so to simplify the mathematical theory. The harmonic analysis in Section 3.5 especially benefits from this choice.
Our decision to restrict attention to compactly supported sections was also made for mathematical reasons: G-equivariant layers are defined in Section 3.4 in terms of an induced representation, which lives on the completion of Γ c (E ρ ) with respect to a certain inner product. This is not a serious restriction from an application viewpoint.

Definition 4.
A feature map is a compactly supported continuous map f : P → V ρ that satisfies the transformation property for all p ∈ P , k ∈ K. The vector space of such feature maps is denoted C c (P ; ρ) Data points and feature maps are, in a sense, dual to each other: Each data point for a feature map f ∈ C c (P ; ρ), where p ∈ P x is any element of the fiber at x ∈ M.
Note that (2.17) does not depend on the choice of p: Given another element p ′ ∈ P x , there exists a unique k ∈ K such that p ′ = p ⊳ k and That is, the equivalence class [p, f (p)] only depends on the basepoint x.
We are almost ready to define general and gauge equivariant layers. Before doing so, however, we must say how gauge transformations χ : P → P act on data points. Let θ χ : P → K be the uniquely defined map satisfying χ(p) = p ⊳ θ χ (p) for all p ∈ P , and define the following action on the associated bundle E ρ : The corresponding action on data points is given by We distinguish between general layers and more specific gauge equivariant layers, as G-equivariant layers in GCNNs will only be a special case of the former.
In equivariant neural networks, data points are sent through a sequence of layers, which are mixed with non-linear activation functions. Again, we focus on individual layers in this paper, and leave the analysis of equivariant activation functions and multi-layer networks for an upcoming paper [23]. The fiber bundle-theoretic concepts discussed in this part describe two kinds of equivariant neural networks: (i) Gauge equivariant neural networks, which respect local gauge symmetry and whose layers are gauge equivariant. (ii) GCNNs, which respect global translation symmetry in homogeneous G-spaces M, and whose layers are G-equivariant (Definition 9).

Remark 4.
Our definition of layers is almost identical to the linear maps in [9]. The difference is that we focus on compactly supported sections, whereas [9] use sections that are supported on a single coordinate chart. Also, [9] investigates the structure of their linear maps under additional assumptions of so-called locality, covariance, and weight-sharing. Covariance is analogous to gauge equivariance in our setting.
A consequence of Lemma 2 is that (gauge equivariant) layers Φ : such that Φs f = s φf . Writing data points as s f = [·, f ] allows us to also express this relation as Φ[·, f ] = [·, φf ]. We think of Φ and φ as two sides of the same coin, and use the name (gauge equivariant) layer for both maps.

a linear transformation and consider the layer
for p ∈ P , f ∈ C c (P ; ρ). Since f and φf are feature maps and thereby satisfy (2.16), the linear transformation T must satisfy for all k ∈ K, p ∈ P, f ∈ C c (P ; ρ). This can be seen to imply that σ • T = T • ρ, so T intertwines the representations ρ and σ. Another way to arrive at this conclusion is to analyze when the corresponding layer is well-defined. Now consider a gauge transformation χ : P → P and its induced map θ χ : P → K. Because T is an intertwiner, hence the layer Φ is automatically gauge equivariant.
As this example illustrates, gauge equivariance is tightly connected to intertwining properties of φ. Rearranging (2.25) gives the following result.
for all gauge transformations χ : P → P and all feature maps f ∈ C c (P ; ρ).
This concludes our discussion of gauge theory and of equivariant neural networks. The framework for the latter is evidently very general, consisting of layers and nonlinear activation functions between data points. There are advantages of working at this level of generality: Ordinary (non-equivariant) neural networks have a multitude of different types of layers, many of them linear. Equivariant analogues of such layers are likely to satisfy either Definition 5(ii) or Definition 9, depending on the relevant type of equivariance. Any result that can be proven using this general framework, will thus be true for many different instances of equivariant neural networks. One example is Theorem 14 below, that characterizes the structure of abstract G-equivariant layers in any GCNN. 8

G-equivariant convolutional neural networks
Recall that GCNNs generalize ordinary CNNs to data points f : M → V defined on homogeneous G-spaces M. Let us give a brief recap on homogeneous spaces and global symmetry, before moving on to discuss homogeneous vector bundles, sections, and induced representations. We will demonstrate that GCNNs and G-equivariant layers (originally defined in [12]) are most naturally understood from the perspective of homogeneous vector bundles. We then use reproducing kernel Hilbert spaces and bandwidth to understand which G-equivariant layers are expressible as convolutional layers.
Since the action (3.1) is transitive, we may choose an arbitrary basepoint x 0 ∈ M and express any other point x ∈ M as x = g · x 0 for some g ∈ G. This group element is typically not unique, but observe that In other words, there is a one-to-one correspondence between points x ∈ M and left cosets gH x0 ∈ G/H x0 .
is an equivariant diffeomorphism.
Homogeneous spaces are globally symmetric in the sense that any point x 0 ∈ M may be chosen as basepoint. Given another choice of basepoint x ′ 0 ∈ M, the spaces G/H x ′ 0 ≃ G/H x0 are diffeomorphically related by a translation in G -more precisely, by the composition F −1 x0 • F x ′ 0 . Euclidean space M = R d , for example, possesses a global translation symmetry, allowing any point to be considered as origin. Similarly, the rotationally symmetric sphere M = S 2 does not have a unique north pole.
We end this part with the following proposition, which is instrumental in relating homogeneous vector bundles to the equivariance framework in Section 2.2.

Homogeneous vector bundles.
Vector bundles may inherit global symmetry from a homogeneous base space; the transitive action (g, x) → g · x may induce linear maps E x → E gx between fibers. Such bundles are naturally called homogeneous and, because this symmetry is also encoded in its sections (data points), we will show that homogeneous vector bundles is the natural setting for studying GCNNs.
From this point on, we restrict attention to homogeneous spaces M = G/K where G is a unimodular Lie group and K ≤ G is a compact subgroup. Elements of the homogeneous space is interchangably denoted as x ∈ M or gK ∈ G/K.

Remark 5.
Examples of unimodular Lie groups include all finite, discrete, compact, or abelian Lie groups, the Euclidean groups, and many others. See [19,20] for details.  5) and such that the induced map L g,x : E x → E gx is linear, for all g ∈ G, x ∈ M.
Example 3. The frame bundle F M is a homogeneous vector bundle whenever M is a homogeneous space, and the same is true of any associated bundle F M × ρ V ρ . In particular, the tangent bundle T M is a homogeneous vector bundle.
Example 4. If (ρ, V ρ ) is a finite-dimensional K-representation, then the associated bundle E ρ = G × ρ V ρ is a homogeneous vector bundle with respect to the left action All homogeneous vector bundles E are of the form G × ρ V ρ , up to isomorphism. To understand why, consider the fiber E K = E eK and observe that the restriction of (3.5) to E K and elements k ∈ K yields invertible linear maps The defining properties of group actions ensure that ρ(k) = L k is a finite-dimensional K-representation on E K . Moreover, because the linear maps L g,x are isomorphisms, any element v ′ of any fiber E x can be obtained as the image v ′ = L g,K (v) =: L g (v) for some choices of g ∈ q −1 ({x}) and v ∈ E K . The mapping is thus surjective. It is not injective, though, since the relation However, the same argument shows that ξ is made injective by passing to the quotient G× ρ E K .
is an isomorphism of homogeneous vector bundles.
We now have two perspectives on bundles G × ρ V ρ : As bundles associated to the principal bundle P = G, and as homogeneous vector bundles (up to isomorphism). The former perspective offers a connection to the framework in Section 2.2, whereas the latter motivates the definition of G-equivariant layers in Section 3.4 below.
3.3. Induced representations. Let us show the relationship between homogeneous vector bundles and induced representations, which will be an essential ingredient in the definition of G-equivariant layers. To this end, let (ρ, V ρ ) be a finite-dimensional unitary K-representation and consider the homogeneous vector bundle E ρ = G× ρ V ρ .
We will need inner products on Γ c (E ρ ) and C c (G; ρ), the former of which is defined using the following unitary structure: defines a complete inner product on each fiber E gK , making E ρ into a Hilbert bundle with L g,x unitary. This unitary structure is unique in that, if we identify V ρ with E K in the canonical manner, then the inner product on V ρ so induced agrees with , ρ .
We also need the following measure on G/K: There is a unique G-invariant, nonzero Radon measure dx on G/K such that the following quotient integral formula holds for every f ∈ C c (G): Using these two ingredients, we make Γ c (E ρ ) into a pre-Hilbert space with respect to the inner product and we denote its completion L 2 (E ρ ). Similarly, C c (G; ρ) is a pre-Hilbert space with respect to the inner product the completion of which is denoted L 2 (G; ρ).
are called induced representations, or representations induced by ρ.
Both ind G K ρ and Ind G K ρ are unitary [47, 5.3.2] and may be identified: Lemma 9. The induced representations ind G K (ρ), Ind G K (ρ) are unitarily equivalent. Proof. This is [47, 5.3.4], but let us write down a proof for clarity. First observe that the isomorphism C c (G; ρ) → Γ c (E ρ ), f → s f is unitary, which follows by combining the quotient integral formula (3.12), the unitarity of ρ, and the compactness of K: For all f, f ′ ∈ C c (G; ρ), the map g → f (g), f ′ (g) ρ lies in C c (G) and so (3.17) The same map f → s f satisfies so it extends to a unitary isomorphism L 2 (G; ρ) → L 2 (E ρ ) intertwining the induced representations.
To gain a better understanding of the induced representations, consider the Bochner space L 2 (G, V ), the space of square-integrable functions f : G → V that take values in a finite-dimensional Hilbert space V . It is itself a Hilbert space with inner product The induced representation (Ind G K ρ, L 2 (G; ρ)) is nothing but the restriction of the left regular representation Λ on L 2 (G, V ρ ) to a closed, invariant subspace. Furthermore, Λ is intimately related to the left regular representation λ on L 2 (G), as the following lemma shows. The proof of this lemma is a short calculation.
Lemma 10. Let V be a finite-dimensional Hilbert space and equip L 2 (G) ⊗ V with the tensor product inner product. Then the natural unitary isomorphism This lemma also shows that, if we choose an orthonormal basis e 1 , . . . , e dim V ∈ V , elements of L 2 (G, V ) are simply linear combinations f = i f i e i with component functions f i ∈ L 2 (G). We use this fact in some calculations of vector-valued integrals, and the component functions will also be important in Section 3.5.

G-equivariant and convolutional layers.
Given a homogeneous G-space M, we observed that vector bundles π : E → M may inherit the global symmetry of M. We took a closer look at such homogeneous vector bundles and found that they are isomorphic to associated bundles G × ρ V ρ , and therefore fit within the equivariance framework of Section 2.2 . We also saw how the global symmetry of M is encoded in data points and feature maps via induced representations, and we want G-equivariant layers to preserve this global symmetry.
Consider homogeneous vector bundles E ρ = G× ρ V ρ and E σ = G× σ V σ , and recall Definition 5 of layers as general linear maps Φ : Γ c (E ρ ) → Γ c (E σ ). We are mainly interested in bounded layers from an application point of view, and we can make this restriction now that the domain and codomain are normed spaces. Furthermore, any bounded layer can be uniquely extended to a bounded linear map and we assume this extension has already been made.
Apart from minor technical differences, Definition 9 coincides with the definition of equivariant maps in [12]. We have thus obtained GCNNs almost directly from the definition of homogeneous vector bundles and a desire for layers to respect the global symmetry. This shows that homogeneous vector bundles is the natural setting for GCNNs.
Let us now define convolutional layers.
Of course, not any function κ : G → Hom(V ρ , V σ ) can be chosen as the kernel of a convolutional layer. The kernel must ensure both that (3.24) is bounded and that φf ∈ L 2 (G; σ) for each f ∈ L 2 (G; ρ). We give a sufficient condition for boundedness in Lemma 12 and the other requirement has been studied in detail in [12,31].
The next result is an almost immediate consequence of the Fubini-Tonelli theorem.
One way to ensure that the operators (3.24)-(3.25) are bounded, is to put a bound on the kernel matrix elements κ ij : G → C for any given choice of bases in V ρ , V σ . Proof. We need only prove that (3.25) is bounded, its adjoint (3.24) will be bounded as well. Choose bases e 1 , . . . , e dim Vρ ∈ V ρ andẽ 1 , . . . ,ẽ dim Vσ ∈ V σ and observe that, because L 2 (G; σ) ⊂ L 2 (G, V σ ), Lemma 10 enables the decomposition of f ∈ L 2 (G; σ) into component functions f i ∈ L 2 (G): To be clear, the kernel κ is similarly decomposed into matrix elements κ ij = ẽ j , κe i σ and we have κ * ji = κ ij . The integral (3.25) now takes the form 27) so by Young's convolution inequality, We are interested in convolutional layers partly because they are concrete examples of G-equivariant layers, which we show next.

Proposition 13. Convolutional layers are G-equivariant layers.
Proof. Convolutional layers κ ⋆ · : L 2 (G; ρ) → L 2 (G; σ) are bounded linear operators by definition, so the only thing we need to prove is that κ ⋆ · intertwines the induced representations. This follows immediately from left-invariance of the Haar measure: For each f ∈ L 2 (G; ρ) and all g, h ∈ G, Example 5. Let us describe where ordinary CNNs fit in the present context. CNNs represent the case G = Z 2 when K = {0} is the trivial subgroup. The corresponding homogeneous space is G/K = Z 2 /{0} = Z 2 and the quotient map q : G → G/K is thus the identity map on Z 2 . Its inverse, the identity map ω : G/K → G, is a globally defined gauge that eliminates the need for gauge equivariance, as we may choose to work exclusively in this one gauge. This is just a reflection of the fact that is (obviously) trivial as a principal bundle. Its associated bundles E ρ = Z 2 × ρ V ρ are also trivial: partly because the finite-dimensional K-representation σ must be trivial, and partly because each equivalence class [g, v] only contains a single representative. These reasons are, of course, due to the triviality of K. This is not to say that the equivariant framework of Section 2.2 is uninteresting when dealing with CNNs, or with GCNNs for other homogeneous spaces M = G/K with K trivial. We saw in Sections 3.2-3.3 how the homogeneity give rise to induced representations, which encode the global symmetry in both data points and feature maps. This is a useful perspective to have, and G-equivariant layers are interesting even when the bundles are trivial.
Triviality of the associated bundles, E ρ ≃ Z 2 × C m where m = dim V σ , 6 implies that data points and feature maps are general square-integrable functions, (3.31) and are thereby extensions of compactly supported functions f : Z 2 → C m . This ties well into the discussion in Section 2.1. Convolutional layers (3.24) reduce to bounded linear operators L 2 (Z 2 , C m ) → L 2 (Z 2 , C n ) and take the form as the Haar measure on Z 2 is the counting measure. The kernel κ : Z 2 → Hom(C m , C n ) is finitely supported in practice, so boundeness of (3.32) is ensured by Lemma 12. Interestingly, all Z 2 -equivariant layers are convolutional layers; there are no other types of Z 2 -equivariant layers than (3.32). This is a consequence of Theorem 14 and is proven in Corollary 20 below.
For more general groups G, it is no longer true that all G-equivariant layers are convolutional layers; we give an example of this fact in Example 6. Implementations of GCNNs, however, are usually based on convolutional layers, or on analogous layers in the Fourier domain. What consequences does the restriction to convolutional layers have for the expressivity of GCNNs? Can we tell whether a given G-equivariant layer is expressible as a convolutional layer? The answer to this last question, it turns out, requires the following notion of reproducing kernel Hilbert spaces.
Definition 11. Let G be a group, let V be a finite-dimensional normed vector space, and let H be a Hilbert space of functions G → V . Then H is a reproducing kernel Hilbert space (RKHS) if the evaluation operator is bounded for all g ∈ G. Moreover, by left-invariant RKH subspace H ⊆ L 2 (G, V ) we mean a closed subspace that is both a RKHS and an invariant subspace for the left regular representation Λ on L 2 (G, V ).

Remark 7.
The term RKHS is typically reserved for the scalar case V = C, when the evaluation operator is a linear functional. Our version would instead be dubbed vector-valued RKHS. We see little benefit from distinguishing between these cases, however, so we use the term RKHS all-encompassingly.
The name RKHS is due to the existence of a kernel-type function that reproduces all elements of H. To see how, choose an orthonormal basis e 1 , . . . , e dim V ∈ V and write elements v ∈ V as linear combinations v = i v i e i . The projection P i (v) = v i onto the i'th component is always continuous, so the composition E g,i := P i • E g is a continuous linear functional for all g ∈ G and i = 1, . . . , dim V . By the Riesz representation theorem, there are elements ϕ g,i ∈ H such that f i (g) = E g,i (f ) = f, ϕ g,i , hence is a left-invariant RKH subspace, expanding the functions ϕ g,i in the orthonormal basis, ϕ g,i = j ϕ j g,i e j , yields the formula where ϕ * g is the conjugate transpose of the matrix (ϕ g ) j i = ϕ j g,i . By left-invariance, hence f ∈ H is reproduced by the operator-valued kernel ϕ e : G → Hom(V ).
Remark 8. The reproducing kernel ϕ e is unique and thus independent of the choice of basis in V . This follows from uniqueness in the Riesz representation theorem.
It is now clear why left-invariant RKH subspaces of L 2 (G, V ) are relevant when discussing convolutional layers, as the latter are given by integral operators similar to (3.37). In order to show that an abstract G-equivariant layer φ : L 2 (G; ρ) → L 2 (G; σ) can be written as a convolutional layer, it is almost necessary for it to act in a RKHS: Example 6. The identity operator φ : L 2 (G; σ) → L 2 (G; σ) is clearly a G-equivariant layer regardless of G, K, σ, but it is only a convolutional layer if L 2 (G; σ) is a RKHS. This is because when φ is the identity, (3.24) becomes the reproducing property It follows that not every G-equivariant layer is a convolutional layer, because L 2 (G; σ) is not always a RKHS. When σ is the trivial representation, for instance, L 2 (G; σ) reduces to L 2 (G) which is not a RKHS when G is nondiscrete [20,Theorem 2.42].
At this point, we know that global symmetry manifests itself in feature maps and data points through the induced representation, and we used this knowledge to define G-equivariant layers. We also defined convolutional layers and showed that these are special cases of G-equivariant layers, but the converse problem is much more subtle: When can a G-equivariant layer be expressed as a convolutional layer? The answer, as we have just seen, is directly related to the concept of RKHS and our next result makes this relation precise. It can be considered our main theorem. 15 Theorem 14. Let G be a unimodular Lie group, let K ≤ G be a compact subgroup, and consider homogeneous vector bundles E ρ , E σ over M = G/K. Suppose that is a G-equivariant layer. If φ maps into a left-invariant RKH subspace H ⊆ L 2 (G; σ), then φ is a convolutional layer.
Remark 10. While Theorem 14 is similar in spirit to [28,Theorem 1], there are also some clear differences. For example, we work with unimodular Lie groups whereas [28] use compact groups, but [28,Theorem 1] is also stronger in this case as there is no criterion on the layer. Another difference is that [28] analyzes the whole network structure while we focus on individual layers. We also assume that the homogeneous space G/K is the same before and after each layer, in constrast to [28].
In the special case of single-layer networks with compact G, [28, Theorem 1] states that any G-equivariant layer is a convolutional layer. Example 6 seems to contradict this statement when G is non-discrete compact. This conflict is possibly due to minor technical differences in the assumptions on layers and data points, but we have not identified the precise cause.
We end this section with a result that could simplify the numerical computations of convolutional layers, as integrals over G/K are sometimes easier to compute than integrals over G. For example when G = SO(3), K = SO(2), and G/K ≃ S 2 . This result is similar to the generalized convolutions described in [28,Section 4.1] Corollary 15. Let φ : L 2 (G; ρ) → L 2 (G; σ) be as in Theorem 14 and let κ be the kernel of the resulting convolutional layer (3.42). Then (3.43) Proof. In the proof of Theorem 14, we constructed the kernel κ from the components of ϕ i ∈ L 2 (G; ρ), and unitarity of ρ clearly implies that the expression f (x), ϕ i (x) ρ is well-defined. We may therefore use the unitary structure (3.11) to get the following relation for all component functions (φf ) i and all g ∈ G: We now obtain (3.43) by reconstructing κ from its components κ ij = ϕ j i . 3.5. RKHS and bandlimited functions. The strength of Theorem 14 naturally depends on how common left-invariant RKH subspaces of L 2 (G; σ) are. Our analysis of G-equivariant layers would not be complete without a discussion on this topic.
Let us proceed by investigating when the component functions f i of f ∈ L 2 (G; σ) are contained in a left-invariant RKH subspace H ⊂ L 2 (G); these subspaces have been fully characterized when the unimodular Lie group G is of type I [8,20]. The unitary equivalence (3.20) then ensures that A(H⊗V σ ) ⊂ L 2 (G, V σ ) is a left-invariant RKH subspace, and so is the closed subspace (3.48) Remark 11. Groups of type I are, in a sense, groups with manageable representation theory. They include the most common groups, such as all finite, discrete, compact, or abelian groups, the Euclidean groups and many other groups. In particular, there is a considerable overlap between type I groups and the unimodular Lie groups that we already consider. See [19,20] for more details.
Remark 12. While ρ, σ still denote finite-dimensional unitary representations of K, we reserve the letter γ for elements of the unitary dual G, i.e., the space of equivalence classes of unitary representations. Specific representatives of γ are written as (π γ , V γ ), and note that V γ need not be finite-dimensional unless G is compact. The unimodular Lie group G is assumed to be of type I throughout this section.
Proposition 16 ([20, Proposition 2.40]). Let H ⊆ L 2 (G) be a left-invariant RKH subspace. The kernel ϕ ∈ H is then a self-adjoint convolution idempotent, 7 and Example 7. Consider the real line G = R and suppose H ⊆ L 2 (R) is a left-invariant RKH subspace with kernel ϕ ∈ H. The calculation in (3.37) with V = C shows that, for all f ∈ H, Since the regular representation λ is continuous, f must be continuous, so H ⊂ C(R). Setting f = ϕ shows that the kernel is a self-adjoint convolution idempotent: Combining the Plancherel transform on L 2 (R) (see Theorem 17 and Section 3.5.1) with the convolution theorem in Fourier analysis, we observe that, for all f ∈ H, In particular,φ =φ 2 , soφ is the characteristic function 1 E on a subset E ⊂ R ≃ R. Insertingφ = 1 E in (3.52) immediately tells us that supp(f ) ⊂ E, so H is a space of bandlimited functions. Moreover, the set E has finite Lebesgue measure according to the Plancherel theorem: vol(E) = 1 E 2 2 = ϕ 2 2 < ∞. This example illustrates that any measurable subset E ⊂ R with finite Lebesgue measure induces a left-invariant closed RKH subspace  [20]. Let us take the short route of stating a theorem on the direct integral decomposition of the left regular representation λ and its commutant and discuss a few consequences of this decomposition, before restricting attention to two important cases where we can be more explicit: Abelian and compact groups.
called the Plancherel transform of G. (b) P implements the following unitary equivalences: Observe that if H ⊆ L 2 (G) is a left-invariant closed subspace, then the projection P : L 2 (G) → H commutes with the left-regular representation and is thus an element of the commutant λ(G) ′ . It therefore has a direct integral decomposition whereP γ ∈ B(V γ ) for each γ ∈ G. We interpret this theorem as a bandwidth restriction, similar to Example 7. The integrand in (3.60) is an integer-valued function on G, so the integral is finite only if the projection (3.59) is supported on a set E ⊆ G of finite Plancherel measure, That is, the left-invariant RKH subspaces H ⊆ L 2 (G) are precisely those subspaces whose elements are bandlimited on a set E ⊆ G, in the sense that, for all f ∈ H and each equivalence class γ ∈ E, Bandlimited functions are thus central to the theory of RKHS and, by extension, to the mathematical theory of GCNNs. Indeed, by extending the concept of bandwidth to feature maps, through (3.48), we obtain the following rephrasing of Theorem 14.
Corollary 19. Let G be a unimodular Lie group of type I, let K ≤ G be a compact subgroup, and consider homogeneous vector bundles E ρ , E σ over G/K. Suppose that is a G-equivariant layer. If φ maps into a space of bandlimited functions, then φ is a convolutional layer.
Remark 14. The relevance of bandwidth for convolutional layers has already been recognized in the case of azimuthally equivariant linear operators on L 2 (S 2 ) [45]. In our setting, these operators translate to certain G-equivariant layers when G = SO(3), K = SO(2), and ρ, σ are the trivial representation.
Remark 15. Some implementations of GCNNs use Fourier transforms and a variant of the convolution theorem f 1 * f 2 =f 1f2 to compute convolutional layers [16,29,45,48]. Feature maps f are then represented by their Fourier transformf which, for numerical reasons, is only approximated up to a finite bandlimit. That is, bandwidth is already being used in implementations.
3.5.1. Abelian groups. The irreducible representations γ ∈ G of any abelian group G are 1-dimensional, and may thus be identified with their character χ γ = tr π γ . There are several useful consequences of this fact. First, the unitary dual G is now the set of continuous homomorphisms χ : G → T, where T is the circle group. This is a locally compact group with respect to pointwise multiplication and, as γ ∈ G is unitary, we may write χ γ = e iξγ where ξ γ : G → R. The Fourier transform then takes the more familiar form f (γ) = G f (g)e −iξγ (g) dg, (3.65) for f ∈ L 1 (G) ∩ L 2 (G). Moreover, the Haar measure on G can be made to coincide with the Plancherel measure such that (3.56) becomes a unitary equivalence P : L 2 (G) → L 2 ( G). (3.66) Another consequence of the fact that irreducible representations are 1-dimensional, is that the integrand in (3.60) takes values in {0, 1} and (3.61) becomes an equality. By the same arguments as in Example 7, we see that the left-invariant RKH subspaces H ⊂ L 2 (G) are the spaces of bandlimited functions, supp(f ) ⊂ E, for subsets E ⊂ G of finite Haar/Plancherel measure. Also, the kernel ϕ E ∈ H is the inverse Plancherel transform of the characteristic function 1 E .
Corollary 20. If G is a discrete and abelian group, then any G-equivariant layer is a convolutional layer.
Proof. Discrete groups are unimodular Lie group of type I, so we may use results from the current section. We note that the integral (3.60) converges for any left-invariant, closed subspace H ⊆ L 2 (G), as the unitary dual G is compact when G is discrete and abelian [14, Proposition 3.1.5], and because the integrand is bounded. Consequently, H = L 2 (G) is itself a RKHS, 8 and the same is true for both L 2 (G, V σ ) ≃ L 2 (G) ⊗ V σ and its closed, left-invariant subspace L 2 (G; σ), independently of K ≤ G and (σ, V σ ). The result now follows from Theorem 14.
By setting G = Z 2 , Corollary 20 establishes that convolutional layers are the only possible translation equivariant layers in the ordinary CNN setting.
3.5.2. Compact groups. When the group G is compact, all irreducible representations are finite-dimensional. Furthermore, the unitary dual G is discrete and the Plancherel measure on G is simply the counting measure. For these reasons, the integral (3.60) reduces to a discrete sum with finite summands, and converges iffP γ = 0 for all but finitely many γ ∈ G.
Corollary 21. If G is finite, then any G-equivariant layer is a convolutional layer.
Proof. When G is a finite group, G is also finite [19,Proposition 5.27] and the integral (3.60) reduces to a finite sum. That is, L 2 (G) is a RKHS and the result now follows in the same way as Corollary 20.

Discussion
In this paper, we have investigated the mathematical foundations of G-equivariant convolutional neural networks (GCNNs), which are designed for deep learning tasks exhibiting global symmetry. We presented a basic framework for equivariant neural networks that include both gauge equivariant neural networks and GCNNs as special cases. We also demonstrated how GCNNs can be obtained from homogeneous vector bundles, when G is a unimodular Lie group and K ≤ G is a compact subgroup.
In Theorem 14, we gave a precise criterion for when a given G-equivariant layer is, in fact, a convolutional layer. This criterion uses reproducing kernel Hilbert spaces (RKHS) and cannot be circumvented, as shown in Example 6. After discussing the relation between RKHS and bandwidth, we were able to reformulate Theorem 14 to get an analogous bandlimit-criterion in Corollary 19. In Corollaries 20-21, we showed that the criterion is automatically satisfied when G is discrete abelian or finite, hence all G-equivariant layers are convolutional layers for these groups.
One limitation of the current paper, compared to [12,28], is that the homogeneous space G/K does not change between layers. This restriction was made in order to limit the scope of our analysis, and the same goes for our restriction to unimodular Lie groups. It would be interesting to go beyond these restrictions in the future.
I am grateful for the support from my research group: Oscar Carlsson, Jan Gerken, Hampus Linander, Fredrik Ohlsson, Christoffer Petersson, and last but not least, my advisor Daniel Persson. Thanks also to David Müller for the interesting discussions on gauge equivariance in physics. This work was supported by the Wallenberg AI, Autonomous Systems and Software Program (WASP) funded by the Knut and Alice Wallenberg Foundation.