7.1 Introduction

Hypergraph has demonstrated its ability to model and learn complex correlations in recent years. Zhou et al. [1] introduced the hypergraph learning, which conducts transductive learning and propagates information on the hypergraph structure. Transductive inference on the hypergraph aims to minimize the label difference between vertices with stronger connections. There has been extensive development and application of hypergraph learning in several fields over the past few years.

In addition, hypergraph has been investigated in deep learning applications. Based on the hypergraph Laplacian and the Chebyshev formula, Feng et al. [2] first introduced hypergraph neural networks (HGNN). The hypergraph Laplacian is synthesized using predictions in Yadati et al. [3], while Bai et al. [4] defined two neural hypergraph operators based on [5, 6]. However, they do not implement high-order learning algorithms by introducing only vertex functions, even though they construct a simple weighted graph and apply mature graph learning algorithms. A lack of powerful tools for expressing hyperstructure and a wealth of graph literature motivated the work of [7]. Additionally, recent successes with graph representation learning have been achieved by using neural operators (convolution, attention, spectral, etc.). Generally, the neural networks on hypergraph can be divided into two categories, including the spectral-based methods and the spatial-based methods.

For the spectral-based methods, Feng et al. [2] introduced the hypergraph neural networks (HGNN) for modeling and learning beyond pairwise complex correlations. Different from the traditional graph neural networks (GNN), HGNN learns its data representation by iteratively propagating the vertex–hyperedge–vertex information pattern. Additionally, the hypergraph Laplacian is first approximated and introduced into the deep hypergraph learning method to speed up the learning process. Following [2], Bai et al. [4] developed an attention module based on hypergraph convolution patterns (Hyper-Atten). Hyper-Atten introduced a hyperedge–vertex attention learning module that adaptively identifies the importance of different vertices in a hyperedge, thus revealing the intrinsic correlations between vertices.

Using the spatial methods, Atwood et al. [8] made use of transition matrices to determine where vertices are located. The generalization of convolution in the spatial domain is achieved using Gaussian mixture models based on local path operators. A graph-based attention-based architecture was built in work [9] for analyzing vertices on graph using attention mechanism. A dynamic change in hypergraph structure was taken into considerations in [6]. The framework introduced in [6], which is more versatile than HGNN [2]. A unified hypergraph is then constructed by merging the correlations from different modalities/types using an adaptive hyperedge grouping strategy. To learn a general data representation for various tasks, a hypergraph convolution scheme [6] was performed in the spatial domain.

Hypergraph spectral graph theory [10] has been explored far less in other methods. The concept of hypergraph learning was first introduced by Zhou et al. [1], where it was presented as a propagation process. The Laplacian matrix, however, is equivalent to pairwise operations according to [11]. There have been several studies addressing non-pairwise relationships since then, including developing nonlinear Laplacian operators [12, 13], learning the optimal parameters of hyperedges [13, 14], as well as utilizing random walking techniques [10]. Hyperedges can be regarded as connectors in these algorithms, which explicitly break the bipartite property of hypergraph by focusing on vertices.

In this chapter, we systematically introduce the above three types of neural networks on hypergraph and show the comparison between graph neural networks and hypergraph neural networks from both spectral and spatial aspects. Part of the work introduced in this chapter has been published in [2, 15, 16].

7.2 Spectral-Based Neural Networks on Hypergraph

The spectral neural networks methods have attracted much attention since Bruna et al. [17] and Kipf et al. [18] simplified them in a graph convolutional network pattern. The data are transformed from the common domain to the spectral domain to be processed with according to map theory and the convolution theorem, and it then gets transformed back to the common domain. In other words, first we convert the signal from the common domain to the frequency domain (Fourier transform implementation) and then multiply it by the phase. Then, we convert the result of the phase multiplication back to the common domain again (Fourier inverse transform implementation). We will present spectral-based hypergraph neural networks methods, including hypergraph neural networks (HGNN) [2], hypergraph convolution with attention (Hyper-Atten) [3], and hyperbolic hypergraph neural networks (HHGNN) [19]. In particular, HHGNN extends hypergraph learning to the hyperbolic spaces beyond the Euclidean space.

7.2.1 Hypergraph Neural Networks

Given a hypergraph \({\mathbb {G}} = (\mathbb {V}, \mathbb {E}, \varDelta )\) with N vertices, the hypergraph Laplacian Δ is an N × N positive semi-definite matrix. The orthonormal eigen vectors Φ = diag(ϕ 1, …, ϕ N) and a diagonal matrix Λ = diag(λ 1, …, λ N), which contains the corresponding non-negative eigenvalues, are obtained by employing the eigendecomposition Δ = ΦΛΦ . \(\hat { x}={ \varPhi }^\top { x}\) defines the Fourier transform for a signal x = (x 1, …, x N) in the hypergraph. It is assumed that the eigenvectors represent the Fourier bases and the eigenvalues represent the frequencies. The spectral convolution of signal x and filter g can be denoted as

$$\displaystyle \begin{aligned} { g \star x} = { \varPhi((\varPhi^\top} { g) \odot (\varPhi^\top} { x})) = { \varPhi}g({ \varLambda}){ \varPhi}^\top{ x}, \end{aligned} $$
(7.1)

where ⊙ denotes the element-wise Hadamard product and g(Λ) = diag(g(λ 1), …, g(λ n)) indicates a function of the Fourier coefficients. However, in the forward and inverse Fourier transforms, the computational cost is O(n 2), which is high. To solve this problem, Defferrard et al. [20] parameterize g(Λ) with K-order polynomials, and one such polynomial is the truncated the Chebyshev expansion. Chebyshev polynomials T k(x) are computed by the formula of T k(x) = 2xT k−1(x) − T k−2(x), with T 0(x) = 1 and T 1(x) = x. After that, the g(Λ) can be computed by

$$\displaystyle \begin{aligned} { g \star x}\approx \sum_{k=0}^{K}\theta_k T_k(\tilde{ \varDelta}){ x}, {} \end{aligned} $$
(7.2)

where \(T_k(\tilde { \varDelta })\) denotes the Chebyshev polynomial of order k with scaled Laplacian \(\tilde { \varDelta } = \frac {2}{\lambda _{max}}{ \varDelta } - I\). In Eq. (7.2), matrix powers, additions, and multiplications are combined instead of expansive computation of Laplacian Eigen vectors, thus improving computation complexity even further. Since that the Laplacian in hypergraph can already represent the high-order correlations among nodes, it can further limit the order of convolution operation to K = 1. It is suggested by Kipf et al. [18] that λ max ≈ 2 for the scale adaptability of neural networks. After that, the convolution operation can be simplified to

$$\displaystyle \begin{aligned} { g \star x} \approx \theta_0 {x} - \theta_1 {\mathbf{D}}_v^{-1/2} \mathbf{HWD}_e^{-1} {\mathbf{H}}^\top {\mathbf{D}}^{-1/2}_v {x}, \end{aligned} $$
(7.3)

where θ 0 and θ 1 represent the parameters of all node filters. In addition, a single parameter θ is used to avoid the overfitting problem, which is defined as

$$\displaystyle \begin{aligned} \left\{ \begin{array}{lr} \theta_1 = -\frac{1}{2}\theta \\[0.1cm] \theta_0 = \frac{1}{2}\theta {\mathbf{D}}^{-1/2}_{v} \mathbf{H} {\mathbf{D}}_{e}^{-1} {\mathbf{H}}^\top{ \mathbf{D}}^{-1/2}_{v}. \end{array} \right. \end{aligned} $$
(7.4)

Thereafter, the convolution process can be simplified to the following function:

$$\displaystyle \begin{aligned} \begin{array}{ll} { g \star x} &\approx \frac{1}{2}\theta {\mathbf{D}}^{-1/2}_{v} \mathbf{H}(\mathbf{W}+\mathbf{I}) {\mathbf{D}}_{e}^{-1} {\mathbf{H}}^\top{ \mathbf{D}}^{-1/2}_{v}{ x} \\[0.2cm] &\approx \theta {\mathbf{D}}^{-1/2}_{v} \mathbf{H} \mathbf{W} {\mathbf{D}}_{e}^{-1} {\mathbf{H}}^\top{ \mathbf{D}}^{-1/2}_{v}{ x}, \end{array} \end{aligned} $$
(7.5)

where (W + I) can be regarded as the weight of the hyperedges. In the initialization of W, the hyperedges can be all assigned with equal weights as an identity matrix.

When having a hypergraph signal X t for the t-th layer, the hyperedge convolution layer HGNNConv can be formulated by

$$\displaystyle \begin{aligned} {\mathbf{X}}^{t+1} = \sigma({\mathbf{D}}^{-1/2}_{v} \mathbf{HWD}_e^{-1}{\mathbf{H}}^\top {\mathbf{D}}^{-1/2}_v{\mathbf{X}}^{t} {\varTheta}), \end{aligned} $$
(7.6)

where Θ is the parameter to be learned during the training process. To extract features from a hypergraph, the filter Θ is applied to the vertices. After convolution, X t+1, which can be used for further processing.

The framework of the abovementioned HGNN model is shown in Fig. 7.1. HGNN is able to address the challenges of learning representations for complex data by incorporating such data structures into hypergraph, which are more flexible and effectively confronting practical data.

Fig. 7.1
An illustration of H G N N model framework. It starts with raw data of modalities from 1 to N, leading to high-order structures, namely hypergraphs from 1 to N with edges and vertices. Hypergraph N predicts vertex labels.

The framework of the HGNN model

The HGNN calculation stages are shown in Fig. 7.2, and the three processes are directly projected to the functions. We can observe that there are vertex feature transform, hyperedge feature gathering, and vertex feature aggregating steps in this framework.

Fig. 7.2
A framework has 2 flow diagrams explaining how node feature transforms into a hyperedge feature, only to result in a node feature. There is an equation below.

The calculation process of the HGNN framework

7.2.2 Hypergraph Convolution and Hypergraph Attention

Based on the study of hypergraph neural networks [2], Bai et al. [4] introduced hypergraph convolution and hypergraph attention (Hyper-Atten) by introducing attention mechanism in the framework.

In this method, an explicit magnitude of importance is assigned to the afferent and efferent information flow for non-binary values of the transition probability between vertices for a given vertex. However, such an attention mechanism must work after the graph structure (the incidence matrix H) is given, instead of learning a dynamic incidence matrix. It is easier to reveal the intrinsic relationship between vertices using a dynamic transition matrix than by using a fixed incidence matrix. An attention learning module could be imposed on H, which does not treat each vertex as being connected by a hyperedge or which does not assign non-binary and real values when measuring the degree of connectivity. Following [6] when the vertex set and the edge set are comparable, the attention score between a given vertex x i and its associated hyperedge x j can be written as

$$\displaystyle \begin{aligned} {\mathbf{H}}_{i j}=\frac{\exp \left(\sigma\left(\operatorname{sim}\left(x_{i} \mathbf{P}, x_{j} \mathbf{P}\right)\right)\right)}{\sum_{k \in {N}_{i}} \exp \left(\sigma\left(\operatorname{sim}\left(x_{i} \mathbf{P}, x_{k} \mathbf{P}\right)\right)\right)}, \end{aligned} $$
(7.7)

where σ(⋅) is a nonlinear activation function. The weight matrix between the (l)-th and (l + 1)-th layers is denoted as \( \mathbf {P} \in \mathbb {R}^{F^{(l)} \times F^{(l+1)}} \). N i is the neighborhood set of x i. The pairwise similarity of two vertices is computed with this similarity function \(\operatorname {sim}(\cdot )\):

$$\displaystyle \begin{aligned} \operatorname{sim}\left(x_{i}, x_{j}\right)={\mathbf{a}}^{\top}\left[x_{i} \| x_{j}\right] . \end{aligned} $$
(7.8)

Operation [, ∥, ] indicates concatenation, and notation a is a weight vector for outputting a scalar similarity value.

When following Eq. (7.6) to learn the intermediate embedding of vertices layer by layer, hypergraph attention also propagates gradients to H in addition to X (l) and Θ. Therefore, Eq. (7.7) means the share of hyperedge x j in the neighbors of the vertex x i, which indicates the relative importance x j of x i. More categorical embeddings can be learned by the probabilistic model, and the relationship between vertices can be described more accurately.

In order to further enhance the capability of representation learning, the method uses hypergraph attention mechanisms based on the basic formulation of performing convolutions.

7.2.3 Hyperbolic Hypergraph Neural Networks

The hyperbolic space is a manifold with constant Gaussian negative curvature everywhere, which has several models. Similar to [21, 22], the work is based on the Poincaré ball model for its well-suited for gradient-based optimization. The Poincaré ball model with constant negative curvature − 1∕k(k > 0) corresponds to the Riemannian manifold \( \left (\mathbb {P}^{n,k}, g_{\mathbf {x}}^{\mathbb {P}}\right )\). \(\mathbb {P}^{n,k} = \left \{\mathbf {x} \in \mathbb {R}^{n}: \| \mathbf {x}\|<1 \right \}\) is an open n-dimensional unit ball, where ∥.∥ denotes the Euclidean norm. Its metric tensor is \(g_{\mathbf {x}}^{\mathbb {P}} = \lambda _{\mathbf {x}}^{2} g^{\mathbb {E}}\), where \(\lambda _{\mathbf {x}} = \frac {2} {1- k\|\mathbf {x}\|{ }^{2} }\) is the conformal factor and \(g^{{\mathbb {E}}}={\mathbf {I}}_{n}\) is the Euclidean metric tensor. Then, we define the Mbius addition of two points \(\mathbf {x}, \mathbf {y} \in \mathbb {P}^{n,k}\) as follows:

$$\displaystyle \begin{aligned} \mathbf{x} \oplus_{k} \mathbf{y}=\frac{\left(1+2k\langle\mathbf{x}, \mathbf{y}\rangle+k\|\mathbf{y}\|{}^{2}\right) \mathbf{x}+\left(1-k\|\mathbf{x}\|{}^{2}\right) \mathbf{y}}{1+2k\langle\mathbf{x}, \mathbf{y}\rangle+k^{2}\|\mathbf{x}\|{}^{2}\|\mathbf{y}\|{}^{2}} . \end{aligned} $$
(7.9)

The distance between two points \(\mathbf {x}, \mathbf {y} \in \mathbb {P}^{n,k}\) is calculated by integration of the metric tensor, which is given as

$$\displaystyle \begin{aligned} d_{\mathbb{P}}^{k} (\mathbf{x}, \mathbf{y}) = (2 / \sqrt{k}) \tanh ^{-1}\left(\sqrt{k}\left\|-\mathbf{x} \oplus_{k} \mathbf{y}\right\|\right) . \end{aligned} $$
(7.10)

Here we can denote point \(\mathbf {z} \in \mathbb {T}_{\mathrm {x}} \mathbb {P}^{n,k}\) as the tangent (Euclidean) space centered at any point x in the hyperbolic space. For the tangent vector z ≠ 0 and the point y ≠ 0, the exponential map \(\exp _{\mathbf {x}}: \mathbb {T}_{\mathbf {x}} \mathbb {P}^{n,k} \rightarrow \mathbb {P}^{n,k}\) and the logarithmic map \(\log _{\mathbf {x}}: \mathbb {P}^{n,k} \rightarrow \mathbb {T}_{\mathbf {x}} \mathbb {P}^{n,k}\) are given for y ≠ x by

$$\displaystyle \begin{aligned} \exp _{\mathbf{x}}^{k}(\mathbf{z})=\mathbf{x} \oplus_{k}\left(\tanh \left(\sqrt{k} \frac{\lambda_{\mathbf{x}}^{k}\|\mathbf{z}\|}{2}\right) \frac{\mathbf{z}}{\sqrt{k}\|\mathbf{z}\|}\right) \end{aligned} $$
(7.11)

and

$$\displaystyle \begin{aligned} \log _{\mathbf{x}}^{k}(\mathbf{y})=\frac{2}{\sqrt{k} \lambda_{\mathbf{x}}^{k}} \tanh ^{-1}\left(\sqrt{k}\left\|-\mathbf{x} \oplus_{k} \mathbf{y}\right\|\right) \frac{-\mathbf{x} \oplus_{k} \mathbf{y}}{\left\|-\mathbf{x} \oplus_{k} \mathbf{y}\right\|} . \end{aligned} $$
(7.12)

The transformation between the tangent space and the hyperbolic space is shown in Fig. 7.3. Leverage the operations of exp and log maps, so that we can use the tangent space \(\mathbb {T}_{\mathbf {x}} \mathbb {P}\) to perform transformations such as convolution and activation in Euclidean space. In the convolution, vertex information is first gathered to the hyperedge for storage, and then each vertex aggregates the information of the connected hyperedge.

Fig. 7.3
An illustration explains the exponential mapping of tangent space into hyperbolic space, and vice versa through logarithmic mapping. There is a network diagram below.

The transformation between the tangent space and the hyperbolic space

It is noted that initial data are on the Euclidean space and need to be converted into embeddings on the hyperbolic space, so then first project the data on the previously obtained Euclidean space onto the hyperbolic manifold space in order to use the spectral-based hypergraph hyperbolic convolutional network to learn the information to update the node embeddings. Set \(t:=\{\sqrt {k}, 0, 0, \dots , 0\}\in \mathbb {P}^{d, k}\) as a reference point to perform tangent space operations. The above condition makes \(\langle (0, {\mathbf {x}}^{0, \mathbb {E}}), t\rangle =0\) hold, so \((0, {\mathbf {x}}^{0, \mathbb {E}})\) can be regarded as the initial embedding representation of the hypergraph structure on the tangent plane of the hyperbolic manifold space \(\mathbb {T}_t\mathbb {P}^{d, k}\). The initial hypergraph structure embedding is then mapped onto the hyperbolic manifold space \(\mathbb {P}\) following [19]:

$$\displaystyle \begin{aligned} \begin{array}{ll} {\mathbf{x}}^{0, \mathbb{P}} &=\exp _{t}^{k}\left(\left(0, {\mathbf{x}}^{0, \mathbb{E}}\right)\right) \\[0.2cm] &=\left(\sqrt{k} \cosh \left(\frac{\left\|{\mathbf{x}}^{0, \mathbb{E}}\right\|{}_{2}}{\sqrt{k}}\right), \sqrt{k} \sinh \left(\frac{\left\|{\mathbf{x}}^{0, \mathbb{E}}\right\|{}_{2}}{\sqrt{k}}\right) \frac{{\mathbf{x}}^{0, \mathbb{E}}}{\left\|{\mathbf{x}}^{0, \mathbb{E}}\right\|{}_{2}}\right). \end{array} \end{aligned} $$
(7.13)

Unlike the previous study [23] that simply generates the hyperedge structure for common domain convolution, combined with the inspiration provided by HGNN [2], hypergraph computation from the perspective of spectral convolution can be conducted.

Given hyperbolic curvatures − 1∕k −1, −1∕k at layers  − 1 and , respectively, then the hyperbolic hypergraph convolution of the hypergraph input signal \({x}^{\mathbb {P}}\) with filter g can be defined as

$$\displaystyle \begin{aligned} \begin{array}{ll} {\mathbf{x}}^{\mathbb{P}} *\mathrm{~g} &=\exp _{x}^{k_{\ell}}\left(\varPhi\left(\left(\varPhi^{\top} \left(\log _{x}^{k_{\ell-1}}\left({x}^{\mathbb{P}}\right)\right)\right) \odot\left(\varPhi^{\top} \mathrm{g}\right)\right)\right) \\[0.3cm] &=\exp _{x}^{k_{\ell}}\left(\varPhi g(\varLambda) \varPhi^{\top} \left(\log _{x}^{k_{\ell-1}}\left({x}^{\mathbb{P}}\right)\right)\right)\text{,} \end{array} \end{aligned} $$
(7.14)

where ⊙ is the element-wise product, \(g(\varLambda )=\operatorname {diag}(\theta )\), and \(\theta =\left [\theta _{1}, \cdots , \theta _{n}\right ]\) is the parameters to be learned. Leverage the operations of exp and log maps, so that the tangent space \(\mathbb {T}_{0} \mathbb {P}^{d, k}\) can be used to perform Euclidean transformations. It operates in the tangent space of each center point \({x}^{\mathbb {P}}\) because the Euclidean approximation is best [19].

Considering the high computational complexity of the Fourier transform and inverse Fourier transform, this convolution method is very expensive to calculate. Convolutions can be computed more efficiently by truncating Chebyshev polynomials as [2]. It can be simply expressed as

$$\displaystyle \begin{aligned} {\mathbf{x}}^{\mathbb{P}} *\mathrm{~g} \approx \exp _{x}^{k_{\ell}}\left(\theta {\mathbf{D}}_{v}^{-1 / 2} \mathbf{H W D}_{e}^{-1} {\mathbf{H}}^{\top} {\mathbf{D}}_v^{-1 / 2}\left(\log _{x}^{k_{\ell-1}}\left({x}^{\mathbb{P}}\right)\right)\right) \text{, } \end{aligned} $$
(7.15)

where W is the initial weight of hyperedges. The above equation uses the hypergraph Laplacian matrix to calculate the total gain obtained after a small perturbation of a point. For a hypergraph with n vertices, the convolution layer can be denoted as following formulation:

$$\displaystyle \begin{aligned} {\mathbf{X}}^{\ell} =\exp _{{\mathbf{x}}^{\ell, \mathbb{E}}}^{k_{\ell}}\left(\sigma\left( \mathbf{A}\left(\log _{{\mathbf{x}}^{\ell-1, \mathbb{P}}}^{k_{\ell-1}}\left({\mathbf{X}}^{\ell-1, \mathbb{P}}\right)\right) \varTheta\right)\right)\text{, } \end{aligned} $$
(7.16)

where \(\varTheta \in \mathbb {R}^{c_{\left ( \ell - 1 \right )} \times c_{\left ( \ell \right )} }\) is the parameter to be learned during the training process, which is applied over the vertices in the hypergraph to extract features. c indicates the size of the embedding dimension, σ denotes the nonlinear activation function, and \(\mathbf {A}={\mathbf {D}}_{v}^{-1 / 2} \mathbf {H W D}_{e}^{-1} {\mathbf {H}}^{\top } {\mathbf {D}}_{v}^{-1 / 2}\).

The hyperbolic operation is accomplished by conducting feature mapping between the Euclidean space and the hyperbolic space. The framework of the above spectral-based hyperbolic hypergraph convolution is shown in Fig. 7.4.

Fig. 7.4
A framework of hypergraph construction as network with e 1 and 2. It leads to spatial mapping and hyperbolic hypergraph convolution.

The framework of the spectral-based hyperbolic hypergraph convolution method

7.3 Spatial-Based Neural Networks on Hypergraph

To show the spatial-based neural networks on hypergraph, we first briefly review the definition of spatial-based graph convolution. The processing on an image is taken as an example. The pixel in an image can be represented as vertices in a grid graph, where each vertex only connects its neighbor vertices in the spatial–closed region where it is located. A C-channel feature can be accordingly generated for each vertex (pixel) in the image. The process of filtering an image can be viewed as an average aggregation of neighbors’ features after a central vertex transforms their features. Similar to convolution neural networks in image processing, spatial-based graph convolution combines the neighbors of the central vertex to produce a new representation. Spatial-based graph convolution runs from neighbor vertices to center vertices, which is similar to the definition of a path in a simple graph. A path in graph is defined as P(v 1, v k) = (v 1, v 2, …, v k). Vertices in the sequence are adjacent to each other, so that every vertex in the sequence is adjacent to every other vertex. It means that all the vertex pairs of i and i + 1 (1 ≤ i ≤ k − 1) have the neighbor relation.

Similar to the spatial-based graph convolution, spatial-based hypergraph neural networks also consider the neighbor information when learning representation. Following, we introduce two typical spatial-based hypergraph neural networks, including general hypergraph neural networks (HGNN+) [16] and dynamic hypergraph neural networks (DHGNN) [15].

7.3.1 General Hypergraph Neural Networks

In this part, the general framework [16] for modeling representation learning using hypergraph neural networks on given raw data is introduced. Figure 7.5 demonstrates the framework of general hypergraph neural networks, which also consists of two procedures, i.e., hypergraph modeling and hypergraph convolution. In the hypergraph modeling step, data issued to generate the high-order correlations, which are represented as a hypergraph. Similar to previous tasks, hyperedge groups can be generated as pairwise edges, k-Hop, and neighbors in the feature space. As a result of this procedure, all types of hyperedge groups (if they are available) are generated and concatenated in a hypergraph for the purpose of data correlations modeling. Hypergraph convolution is the process of creating a set of hypergraph convolutions on a given set of hypergraph, i.e., the spectral-based convolution and the spatial-based hypergraph convolution for representation learning on hypergraph. As a result of these convolution procedures, they can generate much more accurate representations of multi-modal data and high-order correlations using this information.

Fig. 7.5
An illustration explains how hypergraph modeling, with vertices and hyperedges, leads to hypergraph convolution and predicts the score.

An illustration of the general hypergraph neural network framework (HGNN+). This figure is from [16]

(1) Hypergraph Modeling

The first step is to construct a flexible hypergraph from raw data if there is no hypergraph existed, and the data correlations can be modeled using a hypergraph structure. The ability to generate a suitable hypergraph structure is critical to exploit the high-order correlations among the data. Generally, hypergraph structures are not explicit in most cases. Therefore, different strategies are needed to generate the hypergraph. Hypergraph generation from scratch usually involves a combination of three scenarios, namely, data with graph structure, data without graph structure, and data with multi-modal/multi-type representations. Hyperedge generation strategies, which employ pairwise edges, k-Hop, and neighbors in the feature space, respectively, are introduced here. The strategies of using pairwise edges and k-Hop are utilized for hyperedge group generation from the data with a graph structure, and those of using neighbors in feature space are employed for hyperedge group generation from the data without graph structure. Finally, all the hyperedge groups are further concatenated to generate the overall hypergraph.

The above strategies can be used here to generate a number of hyperedge groups. A final hypergraph is then generated by further combining generated or natural hyperedge groups. Supposing there are K hyperedge groups \(\{ {{\mathbb {E}}}_1, {{\mathbb {E}}}_2, \ldots , {{\mathbb {E}}}_K \}\), K indicates incidence matrices \({\mathbf {H}}_k \in \{0, 1\}^{N \times M_k}\), respectively. For the hypergraph \({\mathbb {G}}\), the simplest fusion way to construct the incidence matrix is directly concatenating all the hyperedge groups as H = H 1||H 2||⋯||H K, where ⋅||⋅ is the matrix concatenation operation. These hyperedges weight matrices of hypergraph can be assigned a value of 1 in order to treat them equally. This simplest fusion way can be called as coequal fusion.

It is noted that other combination strategies can be also used according to different application scenarios. As the multi-modal hybrid high-order correlations cannot be fully exploited by a simple coequal fusion, due to differences in information richness between hyperedge groups, an adaptive strategy for the fusion of hyperedge groups, namely Adaptive Fusion, was introduced in [16]. Specifically, each hyperedge group is associated with a trainable parameter that can be used to adjust the effect of multiple hyperedge groups on the final vertex embedding in an adaptive manner, which can be defined as

$$\displaystyle \begin{aligned} \left \{ \begin{array}{ll} {\mathbf{w}}_k &= \text{copy}(\text{sigmoid}(w_k), M_k)\\[0.1cm] \mathbf{W} &= \text{diag}({\mathbf{w}}_1^1, \ldots, {\mathbf{w}}_1^{M_1}, \ldots, {\mathbf{w}}_K^1, \ldots, {\mathbf{w}}_K^{M_K})\\[0.1cm] \mathbf{H} &= {\mathbf{H}}_1 || {\mathbf{H}}_2 || \cdots || {\mathbf{H}}_K \\ \end{array} \right. , \end{aligned} $$
(7.17)

where \({\mathbf {w}}_k \in \mathbb {R}\) is a trainable parameter that is shared by all hyperedges inside a specified hyperedge group k. sigmoid(⋅) is an element-wise normalization function. \({\mathbf {w}}_k = ({\mathbf {w}}^1_k, \cdots , {\mathbf {w}}^{M_k}_k) \in \mathbb {R}^{M_k}\) denotes the generated weight vector for hyperedge group k. copy(a, b) function returns a vector of size b, and the value of which is padded by copying a by b times. Let M = M 1 + M 2 + ⋯ + M K denote the summation of the hyperedges in all hyperedge groups. \(\mathbf {W} \in \mathbb {R}^{M\times M}\) is a diagonal matrix that indicates the weight matrix of hypergraph, and each entry W ii denotes the weight of the corresponding hyperedge e i. By concatenating (⋅||⋅) the incidence matrices of multiple hyperedge groups, H ∈ {0, 1}N×M can denote the incidence matrix of the hypergraph generated.

Multi-model/multi-type data can be analyzed to generate multiple hyperedge groups. From the constructed hyperedge groups, the hypergraph incidence matrix H and hyperedge weight matrix W can be generated, which can then be fed into the hypergraph convolution layer for further processing.

(2) Hypergraph Convolution

Following Definitions 1, 2, 3, an aggregation of neighbor vertex messages via hyperpath is introduced for one spatial hypergraph convolution layer. Given a vertex \(\alpha \in {{\mathbb {V}}}\) of hypergraph \({\mathbb {G}} = \{ {{\mathbb {V}}}, {{\mathbb {E}}}, \mathbf {W} \}\), aggregating messages from its hyperedge inter-neighbor set \(\mathbb {N}_e(\alpha )\) is the aim. In order to obtain those hyperedge messages of each hyperedge β in the hyperedge inter-neighbor set \(\mathbb {N}_e(\alpha )\), aggregating messages from its vertex inter-neighbor set \(\mathbb {N}_v(\beta )\). After that, the two steps of hypergraph convolution make a closed loop from vertex feature sets X t to X t+1. A general spatial hypergraph convolution in the t-th layer can be defined as

$$\displaystyle \begin{aligned} \left\{ \begin{array}{ll} & \left. \begin{array}{ll} m^t_{\beta} &= \sum \limits_{\alpha \in \mathbb{N}_v(\beta)} M^t_v(x^t_\alpha) \\[0.2cm] y^t_\beta &= U^t_e(w_{\beta}, m^t_{\beta}) \end{array} \right\} \text{Stage 1} \\[0.2cm] & \left. \begin{array}{ll} m^{t+1}_{\alpha} &= \sum \limits_{\beta \in \mathbb{N}_e(\alpha)} M^t_e(x^t_\alpha, y^t_\beta) \\[0.2cm] x^{t+1}_\alpha &= U^t_v(x^t_\alpha, m^{t+1}_{\alpha}) \end{array} \right\} \text{Stage 2}\\ \end{array} \right. , \end{aligned} $$
(7.18)

where \(x^t_\alpha \in {\mathbf {X}}^t\) denotes the input feature vector of vertex \(\alpha \in {{\mathbb {V}}}\) in layer t = 1, 2, …, T, and \(x^{t+1}_\alpha \) denotes the updated feature of vertex α. \(m^t_\beta \) denotes the message of hyperedge \(\beta \in {{\mathbb {E}}}\), and w β denotes a weight associated to hyperedge β. \(m^{t+1}_\alpha \) denotes the message of vertex α. \(y^t_\beta \) denotes the hyperedge feature of hyperedge β that denotes an element of hyperedge feature set \(Y^{t} = \{ y^t_1, y^t_2, \ldots , y^t_M \}\), \(y_i^t \in \mathbb {R}^{C_t}\) in layer t. \(M^t_v(\cdot ), U^t_e(\cdot ), M^t_e(\cdot ), U^t_v(\cdot )\) are the vertex message function, hyperedge update functions, hyperedge message function, and vertex update function in t th layer, respectively, which can be defined for specified applications.

With the high-order relationship in the hypergraph structure, the spatial hypergraph convolution layer is designed for high-level representation learning. In comparison with the graph convolution that consists of a single stage of message passing, the spatial hypergraph convolution is composed of four flexible operations with learned differentiable functions. As neighbor relations in graph, there is no natural ordering in inter-neighbors between vertices and hyperedges. Therefore, a summation operation is used to aggregate vertex–hyperedge messages from \(M^t_v(\cdot )\)/\(M^t_e(\cdot )\) operation.

A simple spatial hypergraph convolution layer (named HGNNConv+) via specifying the message-update functions (vertex message function \(M^t_v(\cdot )\), hyperedge update function \(U^t_e(\cdot )\), hyperedge message function \(M^t_e(\cdot )\), and vertex update function \(U^t_v(\cdot )\)) is introduced as

$$\displaystyle \begin{aligned} \left\{ \begin{array}{ll} M^t_v(x^t_\alpha) &= \frac{x^t_\alpha}{| \mathbb{N}_v(\beta) |} \\[0.2cm] U^t_e(w_\beta, m^t_\beta) &= w_\beta \cdot m^t_\beta \\[0.2cm] M^t_e(x^t_\alpha, y^t_\beta) &= \frac{y^t_\beta}{|\mathbb{N}_e(\alpha)|} \\[0.2cm] U^t_v(x^t_\alpha, m^{t+1}_{\alpha}) &= \sigma(m^{t+1}_\beta \cdot {\varTheta}^t) \end{array} \right. , \end{aligned} $$
(7.19)

where \({ \varTheta }^t \in \mathbb {R}^{C^t\times C^{t+1}}\) denotes a trainable parameter of layer t, learned in training phase. σ(⋅) denotes an arbitrary nonlinear activation function such as ReLU(⋅), etc. Note that in Eq. (7.19), \(x^t_\alpha / | \mathbb {N}_v(\beta ) |\) and \(y^t_\beta / |\mathbb {N}_e(\alpha )|\) denote the normalized vertex–hyperedge feature, of which convergence is accumulated and jittering is somewhat minimized.

For faster forward propagation of HGNNConv+ in GPU/CPU devices, here rewrite it in the matrix format. Consider X t as the input vertex feature set of layer t. From Definitions 1, 2, H ∈{0, 1}M×N can control the hyperedge inter-neighbor of each vertex feature in X t. Hence, it can be used to guide each vertex to aggregate and generate the hyperedge feature set Y t, which can be formulated as \({\mathbf {Y}}^t = \mathbf {WD}_e^{-1}{\mathbf {H}}^\top {\mathbf {X}}^t\). In a similar way, the process of updating vertex feature set X t+1 from hyperedge feature set Y t can be formulated as \({\mathbf {X}}^{t+1} = \sigma ({\mathbf {D}}_v^{-1}\mathbf {H}{\mathbf {Y}}^t{ \varTheta }^t)\). Thus, the matrix format of HGNNConv+ can be written as

$$\displaystyle \begin{aligned} {\mathbf{X}}^{t+1} = \sigma( {\mathbf{D}}_v^{-1}\mathbf{HWD}_e^{-1}{\mathbf{H}}^\top {\mathbf{X}}^{t}{\varTheta}^{t}). \end{aligned} $$
(7.20)

Similar to HGNN, X t+1 can be obtained after convolution, which can be used for further learning. As an extension of HGNN [2], this method employs a broad multi-modal/multi-type data correlation model to learn an adaptive weight for each modality/type representation using a single hypergraph model.

7.3.2 Dynamic Hypergraph Neural Networks

Dynamic hypergraph neural networks (DHGNN) [15] is a kind of neural networks modeling dynamically evolving hypergraph structures, which is composed of the stacked layers of two modules: dynamic hypergraph construction and hypergraph convolution. The dynamic hypergraph construction module dynamically updates hypergraph structures on each layer as initially constructed hypergraph may not be an appropriate representation for data. After that, hypergraph convolution is introduced as a means of encoding high-order correlations between data points within a hypergraph. There are two phases in the hypergraph convolution module: vertex convolution and hyperedge convolution, each of which is designed to aggregate features among vertices and hyperedges, respectively.

(1) Dynamic Hypergraph Construction

Symbol \( \operatorname {Con}(e) \) is used to denote the vertex set that a hyperedge e contains, and the symbol \( \operatorname {Adj}(v) \) is used to denote the hyperedge set where all hyperedges containing the vertex v:

$$\displaystyle \begin{aligned} \operatorname{Con}(e) =\left\{v_{1}, v_{2}, \ldots, v_{k_{e}}\right\} , \end{aligned} $$
(7.21)
$$\displaystyle \begin{aligned} \operatorname{Adj}(v) =\left\{e_{1}, e_{2}, \ldots, e_{k_{v}}\right\} \end{aligned} $$
(7.22)

where k e and k v are the number of vertices in hyperedge e, and the number of hyperedges containing vertex v. v is defined as the centroid vertex of the hyperedge set \( \operatorname {Adj}(v) \). Here, traditional k-NN methods and k-means clustering methods can be combined for dynamic hypergraph construction to exploit local and global structures. On the one hand, it has computed the k-1 nearest neighbors for each vertex v. These neighborhood vertices, along with the vertex v, form a hyperedge in \( \operatorname {Adj}(v) \). On the other hand, it has conducted k-means algorithm on the whole feature map of each layer according to the Euclidean distance. For each vertex, the nearest S − 1 clusters are assigned as to be the adjacent hyperedges of this vertex. Here, |Adj(v)| denotes the size of adjacent hyperedge set, x e denotes adjacent hyperedge features, and x v denotes centroid vertex feature. W and b are learnable parameters.

Such a procedure on the feature embedding of each layer is performed. Especially, it initializes hypergraph structures with the input feature embedding. Therefore, the hyperedge set is dynamically adjusted as the feature embedding evolves with network going deeper. In this way, it is able to obtain better hypergraph structures for high-order data correlation modeling with deep neural networks.

(2) Dynamic Hypergraph Convolution

Hypergraph convolution is composed of two sub-modules: vertex convolution sub-module and hyperedge convolution sub-module. By using vertex convolution, vertex features are aggregated to the hyperedge, and then by using hyperedge convolution, adjacent hyperedge features are aggregated to the center vertex.

There are several methods of pooling that can be used, including maximum pooling and average pooling. Vertex aggregation in state-of-the-art algorithms involves a fixed, pre-computed transform matrix generated from graph or hypergraph structure. Nevertheless, such methods cannot effectively model discriminative information among vertex features. For feature permutation and weighting, learn the transform matrix T from the vertex features. Information can flow within and between channels using the transform matrix. Using multi-layer perception (MLP), obtain the transform matrix T and compress the transformed features by using convolution as follows:

$$\displaystyle \begin{aligned} \mathbf{T} =\operatorname{MLP}\left({\mathbf{X}}_{v}\right) \end{aligned} $$
(7.23)

and

$$\displaystyle \begin{aligned} {\mathbf{x}}_{e} =\operatorname{conv}\left(\mathbf{T} \cdot \operatorname{MLP}\left({\mathbf{X}}_{v}\right)\right). \end{aligned} $$
(7.24)

(3) Hyperedge Convolution

Here, the hyperedge convolution is following the spatial convolution strategy, which consists of the aggregation of hyperedge features to center vertex features. Hyperedge convolution employs multi-layer perception to generate weight scores for each hyperedge. As a weighted sum of input hyperedge features, the center vertex feature is computed as an output. This procedure can be formulated as follows:

$$\displaystyle \begin{aligned} w =\operatorname{softmax}\left({\mathbf{x}}_{e} \mathbf{W}+\mathbf{b}\right) \end{aligned} $$
(7.25)

and

$$\displaystyle \begin{aligned} {\mathbf{x}}_{v} =\sum_{i=0}^{|\operatorname{Adj}(v)|} w^{i} {\mathbf{x}}_{e}^{i}. \end{aligned} $$
(7.26)

As a result of these deep learning techniques, graph/hypergraph structure is taken into consideration as prior knowledge to the training of the model. There are, however, a number of hidden and important relationships that are not directly represented in the inherent structure. For vertex convolution, a transform matrix is employed to permute and weight vertices within hyperedges; for hyperedge convolution, an attention mechanism is employed to aggregate adjacent hyperedge features. Figure 7.6 shows the architecture of the DHGNN. The first part of the figure illustrates the process of the hyperedge construction. There are two hyperedges generated from two clusters (dashed ellipses), for example. In the second part, vertices within a hyperedge are aggregated to form a hyperedge feature through vertex convolution, and vertices within adjacent hyperedges are aggregated to form a center vertex feature via hyperedge convolution. In the third part, after performing such operations on all vertices in the current layer feature embedding, the new layer feature embedding and the new hypergraph structure can be constructed.

Fig. 7.6
An illustration explains how hypergraph construction leads to its convolution, with vertex and hyperedge convolutions and features. This results in new feature embedding.

The DHGNN framework. This figure is from [15]

7.4 Comparison Between Graph and Hypergraph Neural Networks

After the previous introduction to the spectral-based and spatial-based hypergraph neural networks methods, we have a basic understanding of the implementation of these methods. In this section, we compare hypergraph neural networks with simple graph neural networks according to spectral and spatial areas to discover the connections and differences between them. The most typical methods of the two neural networks are chosen, the hypergraph neural networks model and the graph neural networks model, as a way to compare the most typical relationships and differences. HGNN [2] and HGNN+ [16] are used to compare them in the spectral and spatial domains, respectively. In terms of convolution, GNN is the classical operator designed to operate on graph, such as [6, 18, 24, 25]. In this subsection, the HGNN [2] and HGNN+ [16] are compared with GNN [18] from the spectral perspective and spatial perspective, respectively. Furthermore, the extended learning domain of the hypergraph emphasizes the connection.

7.4.1 Spectral Perspective

It can be proved that the GNN can be mathematically viewed as a special case of HGNN. Based on the assumption that every hyperedge connects only two nodes and has a weight equal to that of others, the simple hypergraph (2-uniform hypergraph) can also be expressed as a graph that has a graph adjacency matrix A and a vertex degree matrix D, which is a construction similar to \(\mathbb {E}_{\text{pair}}\). It is indicated by the hypergraph incidence matrix H, the vertex degree matrix D v, the hyperedge degree matrix D e, and the hyperedge weight matrix W. Under such circumstances, then the following formulations can reduce the simple hypergraph:

$$\displaystyle \begin{aligned} \left\{ \begin{array}{ll} \mathbf{H}{\mathbf{H}}^\top &= \mathbf{A} + \mathbf{D} \\[0.2cm] {\mathbf{D}}_e^{-1} &= \frac{1}{2}\mathbf{I} \\[0.2cm] \mathbf{W} &= \mathbf{I} \\ \end{array} \right. . \end{aligned} $$
(7.27)

This can be reduced as follows using the hypergraph convolution:

$$\displaystyle \begin{aligned} \begin{array}{ll} {\mathbf{X}}^{t+1} &= \sigma ( {\mathbf{D}}_v^{-1/2} \mathbf{HWD}_e^{-1} {\mathbf{H}}^\top {\mathbf{D}}_v^{-1/2} {\mathbf{X}}^{t} \varTheta^{t} ) \\[0.2cm] &= \sigma ( {\mathbf{D}}_v^{-1/2} \mathbf{H} (\frac{1}{2}\mathbf{I}) {\mathbf{H}}^\top {\mathbf{D}}_v^{-1/2} {\mathbf{X}}^{t} \varTheta^{t} ) \\[0.2cm] &= \sigma ( \frac{1}{2} {\mathbf{D}}^{-1/2} (\mathbf{A+D}) {\mathbf{D}}^{-1/2} {\mathbf{X}}^{t} \varTheta^{t} ) \\[0.2cm] &= \sigma ( \frac{1}{2} ( \mathbf{I} + {\mathbf{D}}^{-1/2} \mathbf{A} {\mathbf{D}}^{-1/2} ) {\mathbf{X}}^{t} \varTheta^{t} ) \\[0.2cm] &= \sigma ( {\mathbf{D}}^{-1/2} \hat{\mathbf{A}} {\mathbf{D}}^{-1/2} {\mathbf{X}}^{t} \hat{\varTheta}^{t} ) \\ \end{array} , \end{aligned} $$
(7.28)

where \(\hat {\mathbf {A}} = \mathbf {I} + {\mathbf {D}}^{-1/2}\mathbf {A}{\mathbf {D}}^{-1/2}\) and \(\hat {\varTheta }^{t} = \frac {1}{2} \varTheta ^t\). The extra \(\frac {1}{2}\) can be absorbed by the learnable parameter Θ. It appears that in modeling the simple graph, the spectral-based hypergraph convolution in HGNN [2] exhibits the same formation as the graph convolution in GCN [18]. Due to its powerful expressive capabilities, the hypergraph convolution not only models and learns the high-order correlation in the hypergraph, but also it has the ability to handle simple graph.

7.4.2 Spatial Perspective

Learning to embed the rooted subtree in low-dimensional space can be viewed as a powerful GNN model [26]. Not only can rooted subtree [27] describe the connections of local vertices, but it can also describe message passing paths in a graph. The rooted subtree can therefore be used to compare HGNN+ [16] with GNN [18]. In hypergraph, the node in the rooted subtree of hypergraph can either be a vertex or a hyperedge in order to satisfy the path definition (also known as the message passing path).

Comparing graph structures that are isomorphic is more straightforward. Therefore, 2-uniform hypergraph (each hyperedge connects only two vertices) is compared. Figure 7.7 displays the rooted subtree for HGNN+ [16] and GNN [18] for a specified vertex, which can also be expressed as the message path in graph and hyperpath in hypergraph. It is obvious that in graph convolution, the vertex features of the neighbors are taken into account. These features are then aggregated to update the central vertex feature at the end of the process. This layer can be described as a hierarchical structure that enables the development of more powerful expressions and modeling capabilities. HGNN+ [16] performs a two stage, i.e., vertex–hyperedge–vertex, transformation. As formulated in Eq. (7.18), the first stage of the procedure generates a hyperedge feature based on the vertex inter-neighboring of the vertex. As a result, the hyperedge inter-neighbor’s features are aggregated to get the updated features of the vertices. Additionally, multi-layer hypergraph convolution has much more message interactions than graph convolution. The rooted vertex appears more frequently in the HGNN+ [16] path of subtrees (like a latent extra self-loop), which accounts for its better performance. In comparison with graph convolution, hypergraph convolution can efficiently extract low- and high-order correlations on hypergraph via vertex–hyperedge–vertex transformation.

Fig. 7.7
An illustration has isomorphism branching into simple graph and 2-uniform hypergraph. Both form their respective rooted sub-trees of neural networks.

Comparison of rooted subtree of graph and 2-uniform hypergraph. This figure is from [16]

7.5 Summary

In this chapter, we introduce two types of hypergraph neural networks learning: spectral-based and spatial-based methods. In spectral-based methods, the hypergraph transforms the nodes in the common and spectral domains by computing the Laplacian matrix. In the spatial-based methods, each node is updated by aggregating information from the nodes on the spatial domain. Then, we consider that most learning methods in graph learning are still simple graph neural networks.

Finally, we also compare hypergraph neural networks and graph neural networks on the previous spectral-based spatial-based and others. According to the comparison of the convolutional computation coefficients, the hypergraph convolution can not only have the comparable expressive ability of GCN when handling a simple graph, but also is capable of modeling and learning high-order correlations within the hypergraph. Comparing hypergraph convolution with graph convolution based on spatial domain comparison, we can find that hypergraph convolution layer can efficiently extract both low-order and high-order correlations on hypergraph using the vertex–hyperedge–vertex transformation.