Introduction

Hypergraphs [7] provide a natural way to model complex patterns of component connectivity in the real world. In comparison to graphs, hypergraphs can connect non-pair-wise relations, a pattern that contains more information. With the development of deep network-based learning methods, hypergraphs have been widely applied in many domains, including pose estimation [16, 29] and brain state classification [6, 32].

Fig. 1
figure 1

Representation of higher order neighborhoods in hypergraphs. The papers of an author from a hyperedge. Author 2 co-authored paper P2 with author 1, meaning that hyperedge 2 constitutes a 1-order neighbor of hyperedge 1. Similarly, by the connection of paper P3, hyperedge 3 is a 2-order neighbor of hyperedge 1

Recently, researchers propose several hypergraph-based neural network frameworks [2, 12, 19, 41, 45, 46]. Most of these methods focus on hypergraph expansion or on extending different network structures. For the Hypergraph Neural Network [12] (HGNN), message propagates by a hypergraph Laplacian operator on a clique expansion hypergraph, and follows a node-hyperedge-node propagation strategy. HyperGCN [41] approximates hyperedges as pair-wise edges, and thus, the hypergraph learning problem is converted to a graph learning problem. Moreover, unified framework [19, 46] of hypergraphs and graphs is emerger as a trend in recent times. Generally, these methods are designed by message passing process, making nodes and hyperedges confine to 1-order neighborhood in single propagation. However, nodes and hyperedges with the same attributes do not exist only in the 1-order neighborhood. For example, multiple authors co-authored a paper, which can be considered as a node, and multiple papers containing the same author are connected to a hyperedge. As shown in Fig. 1, Paper P2 in hyperedge 1 has two common author, and hyperedge 2 is a 1-order neighbor of hyperedge 1. Paper P3 in hyperedge 2 has two common author, and hyperedge 3 is a 2-order neighbor of hyperedge 2. Such a connection provides a way to reveal patterns of cross-domain collaboration. Therefore, hypergraphs serve as a powerful representation method for retaining information through deeper and more complex connectivity relationships. Furthermore, some works [1, 31] on graph learning focus on neighborhood expansion of adjacency matrix. Method [20] uses powers of the incidence matrix to obtain higher order relationships, but it cannot be adapted to hypergraphs with arbitrary hyperedge sizes. Specifically, a larger receptive field means that nodes may receive more performance-degrading noises. Although the higher order neighborhood encapsulates a rich representation, it also brings more challenges, and it remains an open problem to effectively extract valuable information from the complex higher order neighborhood of objects while maintaining the lower order neighborhood information.

To address the above challenges, we propose Multi-Order Hypergraph Convolutional Networks Integrated with Self-Supervised Learning (MO-HGCN), where the multi-order representation maintains in the manner of a multi-channel network. We first perform k-th expansions of Chebyshev polynomials for spectral convolution to obtain spectral 2-order and spectral 3-order hypergraph convolution operators. Specifically, the operators are constructed as independent hypergraph convolution layers and modeled as a 2-order channel and a 3-order channel, respectively. In addition, we adaptly adjust the nodes’ feature on the 1-order hypergraph convolution and utilize it as an enhanced information channel. Then, we propose an inter-order attention mechanism to learn the contrastive information among the different order neighborhoods. By assigning the attention scores to the node embedding of the higher order channel, the low-order neighborhood information is brought into focus. To extract valuable information of the higher order channels, we learn distinct representations in an self-supervised manner by incorporating maximizing mutual information-based contrastive learning. Finally, we fuse the node embedding learned from the 2-order and 3-order channels to represent the completed multi-order embeddings and optimize the weights of the network by joint learning. Compared with existing methods, MO-HGCN is a semi-supervised node classification model that combines self-supervised learning to obtain a multi-order neighborhood representation of nodes. The main contributions of this paper are as follows:

  • We propose a Multi-Order Hypergraph Convolutional Network Integrated with Self-Supervised Learning (MO-HGCN) to explicitly capture the complex relationships of higher order neighborhood by spectral high-order hypergraph convolution operators, and obtain a multi-order representation through a multi-channel network.

  • We propose an inter-order attention mechanism to maintain the information of low-order neighborhoods and learn the distinct representation of higher order neighborhoods by a mutual information maximization strategy in a self-supervised learning manner.

  • We conduct extensive experiments on several hypergraph datasets, and the results show the effectiveness of MO-HGCN compared with the state-of-the-art.

Related works

Hypergraph neural networks

In recent years, hypergraphs have gained attention among researchers, and representation learning methods based on hypergraph have been greatly developed. Feng et al. [12] propose the Hypergraph Neural Network (HGNN), a general framework which implements the message passing strategy on the hypergraph with a hyperedge convolutional layer. To avoid the limitations of inherent hypergraph structure, Jiang et al. [22] propose a dynamic hypergraph neural network that updates the hypergraph structure. Bai et al. [4] propose two trainable operators, namely Hypergraph Convolution and Hypergraph Attention, that can be extended and migrated in neural networks. Besides, some studies propose new hypergraph representation learning frameworks, such as HNHN [10], Hyper-SAGNN [45], HyperSAGE [2], and HGC-RNN [43].

In the exploration of hypergraph structure, HyperGCN [41] makes hypergraphs to be trained on graph convolutional networks by approximating hyperedges as pair-wise edges. Bandyopadhyay et al. [5] apply graph convolution on the line graph of the hypergraph to adapt variable-sized hyperedges. Yang et al. [42] treat the vertices and hyperedges equally to solve the symmetric information loss problem of data co-occurrence. Various types of practices [13] based on hypergraphs are also evolving, such as pose estimation [16, 29], link prediction [11], recommendation [23, 38, 39, 44], and brain state classification [6, 32].

A recent trend combining hypergraphs with graph network methods has emerged as a result of the advantages of data modeling brought by non-pair-wise relations in hypergraphs. Huang et al. [19] propose a framework for modeling the message passing process in graph and hypergraph neural networks. Zhang et al. [46] consider a hypergraph with edge-related vertex weight, propose the generic hypergraph spectral convolution networks (GHSC), and present various variants of hypergraph neural networks.

Fig. 2
figure 2

The framework of MO-HGCN

Self-supervised learning

Self-supervised learning [21, 24, 30] is currently receiving considerable attention in deep learning, serving downstream tasks by learning useful information in unlabeled data. Self-supervised learning has a wide range of applications in computer vision [3, 18], natural language processing [28], and graph learning [17, 25, 34, 37].

One popular approach on graph learning is mutual information maximization, i.e., global–local contrast. Hjelm et al. [18] introduce the application of mutual information maximization strategies on images by proposing Deep InfoMax (DMI). DMI is adapted to different downstream tasks by global–local contrast, e.g., local features are suitable for classification tasks. Veličković et al. [37] extend this paradigm to graph learning and propose the Deep Graph Infomax (DGI). DGI performs global and local neighborhood comparisons on graphs, enabling nodes to learn global and local structural information. InfoGraph [34] maximizes the mutual information between graph-level representations and substructured representations at different scales to learn the global graph representation. Rich representations are also learned from labeled and unlabeled datasets by semi-supervised learning.

The mutual information maximization strategy is extended to certain tasks on hypergraphs. Xia et al. [39] propose a dual channel hypergraph convolutional network, which employ self-supervised learning as an auxiliary task to enhance the performance of session recommendation. Yu et al. [44] use the higher order relations of hypergraphs to obtain complex relationships between users and compensates for the information loss due to multi-channel networks with multi-layer mutual information maximization. These works investigate the impact of mutual information maximization for different types of information, while our work explores the implications of mutual information maximization strategy in higher order neighborhood.

Method

In this section, we describe in detail the proposed Multi-Order Hypergraph Convolutional Network Integrated with Self-Supervised Learning (MO-HGCN). As shown in Fig. 2, MO-HGCN consists of a 2-order channel, a 3-order channel, and an enhanced information channel, with 2-order and 3-order channels as the outputs. Specifically, we design the spectral 2-order and 3-order hypergraph convolution operators to obtain the higher order information. Considering the importance of node features and the preservation of 1-order neighborhoods, we propose an inter-order attention mechanism in the multi-order hypergraph convolution network. Our goal is to fuse the multi-order information to obtain a multi-level representation of the nodes. We further introduce self-supervised learning on the hypergraph, i.e., mutual information maximization between different order to capture the distinctive higher order information of the nodes.

Preliminaries

Given a hypergraph \(\mathcal {G} = \left( \mathcal {V}, \mathcal {E}, \textbf{W} \right) \), \(\mathcal {V}\) is a vertex set \(\left\{ v_1, v_2, \ldots , v_n\right\} \) with n nodes, and \(\mathcal {E}\) is a hyperedge set \(\left\{ e_1, e_2, \ldots , e_m\right\} \) with m hyperedges. The hyperedge weight \(\textbf{W}\) represents a diagonal matrix which the hyperedge weight set \(\left\{ W_1, W_2, \ldots , W_m\right\} \) is the diagonal. Thus, hypergraph \(\mathcal {G}\) can be represented as an incidence matrix \(\textbf{H} \in \mathbb {R}^{|\mathcal {V}| \times |\mathcal {E}|}\), and the entries of \(\textbf{H}\) denote as

$$\begin{aligned} h(v, e) = {\left\{ \begin{array}{ll} 1, &{} \text{ if } v \in e,\\ 0, &{} \text{ if } v \notin e. \end{array}\right. } \end{aligned}$$
(1)

The Laplacian [47] \(\varvec{\Delta }\) of a hypergraph \(\mathcal {G}\) denotes as

$$\begin{aligned} \varvec{\Delta } = \textbf{I} - \textbf{D}_v^{-1/2} \textbf{H} \textbf{W} \textbf{D}_e^{-1} \textbf{H}^T \textbf{D}_v^{-1/2}, \end{aligned}$$
(2)

where the \(\textbf{D}_v\) is a diagonal matrix of vertex degree and the \(\textbf{D}_e\) is a diagonal matrix of hyperedge degree.

Alternatively, given a hypergraph \(\mathcal {G} = \left( \mathcal {V}, \mathcal {E}, \textbf{W}, \varvec{\Delta } \right) \) with n nodes and m hyperedges, the hypergraph Laplacian \(\varvec{\Delta } \in \mathbb {R}^{n \times n}\) can be decomposed into an orthonormal eigen vectors \(\Phi \) and a non-negative eigenvalue diagonal matrix \(\Lambda \). For a hypergraph \(\mathcal {G}\) and a signal \(\varvec{x}\), the spectral convolution of a signal \(\varvec{x}\) on a filter \(\varvec{g}\) represents as

$$\begin{aligned} \varvec{g} \star \varvec{x} = \Phi \varvec{g}(\Lambda ) \Phi ^{T} \varvec{x}, \end{aligned}$$
(3)

where the symbol \(\star \) represents the convolution operator, and \(\varvec{g}(\Lambda )\) indicates the Fourier coefficients. The function \(\varvec{g}(\Lambda )\) is further parameterized as K order polynomials which express as the truncated Chebyshev expansion. The Chebyshev polynomial is expanded as \(T_k(x) = 2xT_{k-1}(x) - T_{k-2}(x)\), and \(T_0(x)=1, T_1(x)=x\). With the truncated Chebyshev expansion, the spectral convolution approximatively represents as

$$\begin{aligned} \varvec{g} \star \varvec{x} \approx \sum _{k=0}^K \theta _k T_k(\tilde{\varvec{\Delta }}) \varvec{x}, \end{aligned}$$
(4)

where \(T_k(\tilde{\varvec{\Delta }})\) indicates the K order Chebyshev polynomial, and \(\tilde{\varvec{\Delta }} = \frac{2}{\lambda _{\max }} \varvec{\Delta } - \textbf{I}\) is a scaled Laplacian. Moreover, \(\lambda _{\max } \approx 2\) according to works [12, 26]. Therefore, the spectral 1-order hypergraph convolution can be defined as

$$\begin{aligned} \textbf{X}^{(l+1)} = \textbf{D}_v^{-1/2} \textbf{H} \textbf{W} \textbf{D}_e^{-1} \textbf{H}^T \textbf{D}_v^{-1/2}\textbf{X}^{(l)}\theta ^{(l)}. \end{aligned}$$
(5)

Multi-order hypergraph convolutional network

On the basis of the spectral convolution theory of hypergraphs, we further develop an operator that facilitates nodes and hyperedges to interact in higher order neighborhoods.

For higher order hypergraph convolution, we extend the order of the neighborhoods from the perspective of spectral hypergraph convolution. According to the Chebyshev polynomial in Eq. (4), k is set to 2 to obtain the spectral 2-order hypergraph convolution operator as follows:

$$\begin{aligned} \begin{aligned} \varvec{g} \star \varvec{x}&\approx \sum _{k=0}^2 \theta _k T_k(\tilde{\varvec{\Delta }}) \varvec{x} \\&= \theta _0 T_0(\tilde{\varvec{\Delta }})\textbf{x} + \theta _1 T_1(\tilde{\varvec{\Delta }})\textbf{x} + \theta _2 T_2(\tilde{\varvec{\Delta }})\textbf{x} \\&= \theta _0 \textbf{x} - \theta _1^{(2)} \varvec{\Theta } \textbf{x} + 2 \theta _2 \varvec{\Theta }^2 \textbf{x} - \theta _2 \textbf{I} \textbf{x}, \end{aligned} \end{aligned}$$
(6)

where \(\theta _0\), \(\theta _1\), and \(\theta _2\) denote the parameters of filters \(\varvec{g}\), and \(\varvec{\Theta } = \textbf{D}_v^{-1/2} \textbf{H} \textbf{W} \textbf{D}_e^{-1} \textbf{H}^T \textbf{D}_v^{-1/2}\). Following the way of works [12, 26] to avoid over-parameterization, we reduce multiple parameters to a single parameter which is assumed as

$$\begin{aligned} {\left\{ \begin{array}{ll} \theta _2 = -\frac{1}{2} \theta ,\\ \theta _1 = - \varvec{\Theta } \theta ,\\ \theta _0 = \varvec{\Theta }^2 \theta . \end{array}\right. } \end{aligned}$$
(7)

Thus, the spectral 2-order hypergraph convolution operator can be simplified as follows:

$$\begin{aligned} \begin{aligned}&\varvec{g} \star \varvec{x} \approx \theta \left( \varvec{\Theta }^2 + \frac{1}{2}\textbf{I} \right) \textbf{x}. \end{aligned} \end{aligned}$$
(8)

With the spectral 2-order hypergraph convolution operator, the 2-order hypergraph convolution of signal \(\textbf{X}\) can be defined as

$$\begin{aligned} \textbf{X}^{(l+1)} = \theta ^{(l)}\left( \varvec{\Theta }^2 + \frac{1}{2}\textbf{I} \right) \textbf{X}^{(l)}, \end{aligned}$$
(9)

where \(\theta \in \mathbb {R}^{C_\textrm{in} \times C_\textrm{out}}\) represents the learnable parameter.

Similarly, the spectral 3-order hypergraph convolution operator represents as

$$\begin{aligned} \begin{aligned} \varvec{g} \star \varvec{x}&\approx \sum _{k=0}^3 \hat{\theta }_k T_k(\tilde{\varvec{\Delta }}) \varvec{x} \\&= \hat{\theta }_0 T_0(\tilde{\varvec{\Delta }})\textbf{x} + \hat{\theta }_1 T_1(\tilde{\varvec{\Delta }})\textbf{x} + \hat{\theta }_2 T_2(\tilde{\varvec{\Delta }})\textbf{x} + \hat{\theta }_3 T_3(\tilde{\varvec{\Delta }})\textbf{x} \\&= \hat{\theta }_0 \textbf{x} - \hat{\theta }_1 \varvec{\Theta } \textbf{x} + 2 \hat{\theta }_2 \varvec{\Theta }^2 \textbf{x} - \hat{\theta }_2 \textbf{I} \textbf{x} - 4 \hat{\theta }_3 \varvec{\Theta }^3 \textbf{x} + 3 \hat{\theta }_3 \varvec{\Theta } \textbf{x}, \end{aligned} \end{aligned}$$
(10)

where \(\hat{\theta }_0\), \(\hat{\theta }_1\), \(\hat{\theta }_2\), and \(\hat{\theta }_3\) denote the parameters of filters \(\varvec{g}\). We also uses a single parameter \(\hat{\theta }\) to avoid over-parameterization as follows:

$$\begin{aligned} {\left\{ \begin{array}{ll} \hat{\theta }_3 = \frac{1}{2} \hat{\theta },\\ \hat{\theta }_2 = \frac{1}{2} \varvec{\Theta }\hat{\theta },\\ \hat{\theta }_1 = \frac{1}{2} \varvec{\Theta }^2 \hat{\theta },\\ \hat{\theta }_0 = -\frac{1}{2} \varvec{\Theta }^3 \hat{\theta }. \end{array}\right. } \end{aligned}$$
(11)

With the single parameter, the spectral 3-order hypergraph convolution operator can be simplified as

$$\begin{aligned} \begin{aligned} \varvec{g} \star \varvec{x}&\approx -2\hat{\theta }\left( \varvec{\Theta }^3 - \frac{1}{2}\varvec{\Theta } \right) \textbf{x} \\&= \theta \left( \varvec{\Theta }^3 - \frac{1}{2}\varvec{\Theta } \right) \textbf{x}. \end{aligned} \end{aligned}$$
(12)

Thus, the 3-order hypergraph convolution of signal \(\textbf{X}\) can be defined as

$$\begin{aligned} \textbf{X}^{(l+1)} = \theta ^{(l)}\left( \varvec{\Theta }^3 - \frac{1}{2}\varvec{\Theta } \right) \textbf{X}^{(l)}, \end{aligned}$$
(13)

where \(\theta \in \mathbb {R}^{C_\textrm{in} \times C_\textrm{out}}\) represents the learnable parameter.

In MO-HGCN, l in Eqs. (9) and (13) is set to 2. Therefore, the backbone network of MO-HGCN is a multi-channel network with two layers. This approach preserves the information of different order neighborhoods and facilitates the nodes to learn multi-order representations.

Inter-order attention

The spectral high-order hypergraph convolution operator allows each node to aggregate information from distal nodes and hyperedges. Such information may not be applicable to learn directly, which presents a challenge in regulating the involvement of low-order information. Therefore, we propose an inter-order attention mechanism to indicate the similarity between higher order neighborhoods and low-order neighborhoods. Unlike previous attentions, we focus on the comparison of attention between different orders of the same node, rather than among neighboring nodes. Particularly, we design an enhanced information channel based on 1-order hypergraph convolution that augments the nodes’ own information. The convolution process of this channel is as follows:

$$\begin{aligned} \varvec{h} = \theta \textbf{D}_v^{-1/2} \textbf{H} \textbf{W} \textbf{D}_e^{-1} \textbf{H}^T \textbf{D}_v^{-1/2}\textbf{x} + \theta \beta \textbf{I} \textbf{x}, \end{aligned}$$
(14)

where \(\varvec{h}\) represents the enhanced node embedding, and the \(\beta \in \mathbb {R}^N\) denotes a learnable parameter that assigns different self-loop weights to each node.

Fig. 3
figure 3

The inter-order attention mechanism of MO-HGCN

As shown in Fig. 3, we obtain the node embeddings \(\varvec{z}^{o2}\) and \(\varvec{z}^{o3}\) of the 2-order and 3-order channels after the first layer of convolution. Then, the attention mechanism is applied as between \(\varvec{z}^{o2}\) and \(\varvec{h}\) , and between \(\varvec{z}^{o3}\) and \(\varvec{h}\), respectively.

The attention scores of the nodes embeddings \(\varvec{z}_{i}^{l} \in \mathbb {R}^K\) and \(\varvec{z}_{i}^{h}\in \mathbb {R}^K\) in the j-th dimensional feature for the low-order channel and the higher order channel are calculated by Eq. 15

$$\begin{aligned} \alpha _{i j}=\frac{\exp \left( {\text {LeakyReLU}}\left( MLP(\varvec{z}^{l}_{ij} \Vert \varvec{z}^{h}_{ij}\right) \right) }{\sum _{k \in i} \exp \left( {\text {LeakyReLU}}\left( MLP(\varvec{z}^{l}_{ik} \Vert \varvec{z}^{h}_{ik}\right) \right) }, \end{aligned}$$
(15)

where \(\varvec{z}^{l}_{ij}\) and \(\varvec{z}^{h}_{ij}\) represent the j-th dimensional feature of node i, \(MLP(\bullet ):\mathbb {R}^{2 \times K} \rightarrow \mathbb {R}^K\) is the feature mapping function which set as a multi-layer perceptron, and \(\Vert \) denotes the concatenation operation. Therefore, the attention score \(\alpha _{o2}\) between 2-order channel and enhanced information channel calculated as

$$\begin{aligned} \alpha ^{o2}=\frac{\exp \left( {\text {LeakyReLU}}\left( MLP(\varvec{h}_{ij} \Vert \varvec{z}^{o2}_{ij}\right) \right) }{\sum _{k \in i} \exp \left( {\text {LeakyReLU}}\left( MLP(\varvec{h}_{ik} \Vert \varvec{z}^{o2}_{ik}\right) \right) }. \end{aligned}$$
(16)

The attention score \(\alpha _{o3}\) between 3-order channel and enhanced information channel calculated as

$$\begin{aligned} \alpha ^{o3}=\frac{\exp \left( {\text {LeakyReLU}}\left( MLP(\varvec{h}_{ij} \Vert \varvec{z}^{o3}_{ij}\right) \right) }{\sum _{k \in i} \exp \left( {\text {LeakyReLU}}\left( MLP(\varvec{h}_{ik} \Vert \varvec{z}^{o3}_{ik}\right) \right) }. \end{aligned}$$
(17)

The attention scores \(\alpha _{o2}\) and \(\alpha _{o3}\) are further assigned to the higher order node embedding \(\varvec{z}^{o2}\) and \(\varvec{z}^{o3}\) as a way to enhance the most relevant node representation between channels, and the processes are represented as

$$\begin{aligned} \varvec{\hat{z}}^{o2}= & {} \alpha ^{o2} \varvec{z}^{o2}, \end{aligned}$$
(18)
$$\begin{aligned} \varvec{\hat{z}}^{o3}= & {} \alpha ^{o3} \varvec{z}^{o3}. \end{aligned}$$
(19)

To preserve the original higher order message of node embedding, we fuse node embeddings \(\varvec{\hat{z}}^{o2}\) and \(\varvec{\hat{z}}^{o3}\) with node embeddings \(\varvec{z}^{o2}\) and \(\varvec{z}^{o3}\), respectively. The final obtained higher order channel node embedding \(\varvec{\hat{z}}^{o2}\) and \(\varvec{\hat{z}}^{o3}\) are as follows:

$$\begin{aligned} \varvec{\tilde{z}}^{o2}= & {} \lambda \varvec{z}^{o2} + (1 - \lambda ) \varvec{\hat{z}}^{o2}, \end{aligned}$$
(20)
$$\begin{aligned} \varvec{\tilde{z}}^{o3}= & {} \mu \varvec{z}^{o3} + (1 - \mu ) \varvec{\hat{z}}^{o3}, \end{aligned}$$
(21)

where the \(\lambda \in \mathbb {R}^1\) and \(\mu \in \mathbb {R}^1\) denote the learnable parameter restricted to [0,1], controlling the involvement of two node embeddings.

Self-supervised learning auxiliary task

Multi-order hypergraph convolutional networks enable nodes to learn multiple levels of representations, further improving model performance. However, the multi-channel structure is independent of each other, and the higher order information usually contains varying degrees of redundant information. Therefore, it is worth considering how to extract the distinctive message from the multi-order hypergraph convolutional network. Inspired by mutual information maximization could improve the Deep Graph Infomax (DGI) [37] performance. We extend the mutual information maximization to the inter-order to guide the model to reduce feature redundancy. Specifically, we construct a contrastive learning between the enhanced information channel and the higher order channel, respectively. For the 2-order channel, the positive sample pair is \(\left( \varvec{h}_{ij}, \varvec{\tilde{z}}^{o2}_{ij}\right) \) and the negative sample pair is \(\left( \varvec{\hat{h}}_{ij}, \varvec{\tilde{z}}^{o2}_{ij}\right) \), where \(\varvec{\hat{h}}_{ij}\) denotes the negative sample with row-wise shuffling. We utilize InfoNCE [18] as the loss function for contrastive learning, as follows:

$$\begin{aligned} \begin{aligned} \mathcal {L}_{s1}&=- \sum _{i\in \mathcal {V}}\left( \sum _{j\in k}\log \sigma \left( \mathcal {S}\left( \varvec{h}_{ij}, \varvec{\tilde{z}}^{o2}_{ij}\right) \right) \right. \\&\quad \left. + \sum _{j\in k}\log \sigma \left( 1-\mathcal {S}\left( \varvec{\hat{h}}_{ij}, \varvec{\tilde{z}}^{o2}_{ij}\right) \right) \right) , \end{aligned} \end{aligned}$$
(22)

where \(\mathcal {S}\left( \bullet \odot \bullet \right) \) denotes the discriminator function as the dot product.

For the 3-order channel, the positive sample pair is \(\left( \varvec{h}_{ij}, \varvec{\tilde{z}}^{o3}_{ij}\right) \) and the negative sample pair is \(\left( \varvec{\hat{h}}_{ij}, \varvec{\tilde{z}}^{o3}_{ij}\right) \), where \(\varvec{\hat{h}}_{ij}\) denotes the negative sample with row-wise shuffling. The InfoNCE loss function \(\mathcal {L}_{s2}\) is defined as

$$\begin{aligned} \begin{aligned} \mathcal {L}_{s2}&=- \sum _{i\in \mathcal {V}}\left( \sum _{j\in k}\log \sigma \left( \mathcal {S}\left( \varvec{h}_{ij}, \varvec{\tilde{z}}^{o3}_{ij}\right) \right) \right. \\&\quad \left. + \sum _{j\in k}\log \sigma \left( 1-\mathcal {S}\left( \varvec{\hat{h}}_{ij}, \varvec{\tilde{z}}^{o3}_{ij}\right) \right) \right) . \end{aligned} \end{aligned}$$
(23)
figure a
Table 1 Summary of the hypergraph datasets used

Model learning

The node embedding \(\varvec{\tilde{z}}^{o2}\) and \(\varvec{\tilde{z}}^{o3}\) input to the second layer of multi-order hypergraph neural network. The outputs of second layer are denoted as \(\varvec{X}^{(2)} \in \mathbb {R}^{N\times q}\), and \(\varvec{X}^{(3)}\in \mathbb {R}^{N\times q}\), where q is the number of classes. To conduct the node classification, we adopt the summation strategy to achieve the fusion of multi-channel information by Eq. (24)

$$\begin{aligned} \varvec{\hat{X}} = \varvec{X}^{(2)} + \varvec{X}^{(3)}. \end{aligned}$$
(24)

Then, we adopt the softmax function to predict the label \(\hat{Y}\) by \(\varvec{\hat{X}}\). Thus, the cross-entropy loss function for node classification is defined as follows:

$$\begin{aligned} \mathcal {L}_{c} = - \sum _{i \in \mathcal {V}_L} \sum _{j=1}^q Y_{ij}\ln \hat{Y}_{ij}, \end{aligned}$$
(25)

where the \(Y_{ij}\) denotes the true labels of \(\mathcal {V}_L\).

Therefore, the joint learning loss function is as follows:

$$\begin{aligned} \mathcal {L} = \mathcal {L}_{c} + \eta _1\mathcal {L}_{s1} + \eta _2\mathcal {L}_{s2}, \end{aligned}$$
(26)

where \(\eta _1\) and \(\eta _2\) are hyperparameters that control the participation of self-supervised learning. Algorithm 1 reports the overall process of MO-HGCN.

Experiments

In this section, we conduct experiments and validate our model by answering the following questions.

  • Q1: How does MO-HGCN perform on the node classification task?

  • Q2: How does high-order spectral hypergraph convolution perform compared to 1-order spectral hypergraph convolution?

  • Q3: How does the inter-order attention mechanism contribute to the performance of MO-HGCN?

  • Q4: How sensitive is the performance of MO-HGCN to its parameter settings?

  • Q5: How does the self-supervised learning component affect the effectiveness of MO-HGCN?

Datasets

For the semi-supervised node classification task on hypergraphs, we use the five hypergraph datasets provided by HyperGCN [41] for validation. These datasets include the co-citation network and the co-authorship network. Summary of the datasets is shown in Table 1 and details are as follows:

  • Co-citation datasets: The original sources of the co-citation network hypergraph dataset are cora, citeseer, and PubMed. In the hypergraph construction, all documents are created as nodes, and documents cited by the same document are grouped as a hyperedge. Hyperedges containing only one node are removed and the node feature is the bag-of-words vector of documents.

  • Co-authorship datasets: The original sources of the co-authorship network hypergraph dataset are DBLP and cora. In the hypergraph construction, all papers are considered as nodes and papers authored by a author are grouped as a hyperedge. The nodes are characterized by the bag-of-words vector of papers.

Table 2 Test accuracy (%) of node classification on hypergraph datasets

Baselines

We compare the proposed method with state-of-the-art baselines that include a variety of hypergraph neural networks combined with different neural network models. Details of these approach are as follows:

  • MLP+HLR [2]: The method is a multi-layer perceptron using explicit hypergraph Laplacian for regularization.

  • HyperGCN [2]: HyperGCN approximates the hypergraph learning problem to a graph problem by pair-wise edges and provides a variant FastHyperGCN that reduces the training time.

  • HyperSAGE [2]: HyperSAGE utilizes a two-level neural messaging strategy to propagate information in the hypergraph and combines different neighborhood aggregation approaches of GraphSAGE [15].

  • HGNN [12]: HGNN introduces the symmetric normalized hypergraph Laplacian [47] operator by means of spectral hypergraph theory and provides a general framework for hypergraph neural networks.

  • UniGCN [19]: UniGNN unifies the message passing process of graphs and hypergraphs into a framework, extending the Graph Neural Networks design naturally to hypergraphs.

  • UniGAT [19]: The method extends the aggregation process of Graph Attention Networks [36] to hypergraphs, so that nodes learn the attention weights of neighboring hyperedges.

  • UniGIN [19]: The method uses the mechanism of Graph Isomorphic Networks [40] to enhance the expressiveness by aggregating the information of neighboring hyperedges by nodes.

  • UniSAGE [19]: The method is a variant of GraphSAGE [15], which adapts to different tasks by means of different aggregation functions.

  • H-ChebNet [46]: Combined with ChebNet [9], a variant derived on the general hypergraph spectral convolution framework.

  • H-APPNP [46]: H-APPNP is a hypergraph convolutional network with APPNP [27] as the backbone network.

  • H-SSGC [46]: The method extends the SSGC [48] to a general hypergraph spectral convolution framework.

  • H-GCN [46]: H-GCN is a general hypergraph spectral convolution framework with Graph Convolutional Networks [26] as the backbone network.

  • H-GCNII [46]: H-GCNII extends the GCNII [8] to a general hypergraph spectral convolution framework, which is a deep network structures.

Fig. 4
figure 4

Performance of different channels and components in MO-HGCN. The vertical axis denotes the accuracy and the horizontal axis represents the different channels and components. For the horizontal axis, the 1-order represents the 1-order approximation of the hypergraph convolution, meaning HGNN [12]. The 2-order and 3-order represent the 2-order and 3-order channels in MO-HGCN, respectively. The Multi-order denotes the MO-HGCN only consisting of 2-order and 3-order channels. The inter-order attention represents the MO-HGCN involving the inter-order attention component, and the Self-Supervised denotes the MO-HGCN with all channels and components

Experiments settings

For the semi-supervised node classification task, we use ACC (Accuracy) to evaluate the performance of the model. In the experimental setup, we utilize the Adma algorithm for training and set it as 2000 epochs. For the Cora (including Co-citation and Co-authorship) and Citeseer datasets, the learning rate is 0.005 and L2 regularization is 0.05. For the DBLP and Pubmed datasets, the learning rate is 0.05 and L2 regularization is 0.002. For the hyperparameters \(\eta _1\) and \(\eta _2\), the Cora (including Co-citation and Co-authorship) are set to 0.005 and 0.005, respectively, while other datasets used are set to 0.001. Each dataset has ten different split training-test sets with consistent training-test ratios. We follow the way of work [19] to test the datasets. For the baselines, we cite the experimental results reported in the original paper, since the compared datasets and evaluation metric are consistent.

Fig. 5
figure 5

Distribution of inter-order attention scores

Fig. 6
figure 6

Node classification results for MO-HGCN with different assignments on the \(\eta _1\) and \(\eta _2\) . The horizontal axis represents \(\eta _2\) . The vertical axis represents \(\eta _1\)

Fig. 7
figure 7

2D visualization of T-SNE for node embedding on co-citation and co-authorship datasets. The first row represents the co-citation cora dataset, the second row represents the co-citation citeseer dataset, and the third row represents the co-authorship cora dataset. Note that we named the MO-HGCN without self-supervised learning as MO-HGCN (w/o)

Experimental results

Performance analysis

We report the mean accuracy and standard deviation of the experimental results in Table 2, with the best results in bold and the second best results underlined. The experimental results show the advantage of our model in terms of its accuracy compared with the state-of-the-art, with improvements of 2.0%, 4.0%, 3.1%, and 1.0% on the co-citation cora, citeseer, PubMed, and co-authorship cora datasets, respectively. The best-performing method on the co-authorship DBLP dataset is H-GCNII. The experimental results in Table 2 answer the question \({\textbf {Q1}}\): The proposed model outperforms the state-of-the-art baselines.

Comparatively to the pair-wise edge approximation of HyperGCN, MO-HGCN utilizes clique expansion to approximate the hypergraph structure as HGNN, and the multi-order approximation neighborhood further enlarges the receptive field of nodes and hyperedges. It is for this reason that MO-HGCN performs better than HyperGCN and HGNN. Compared to models combining the hypergraph with other GNN methods, although these models absorb the advantages of different GNN structures and perform well, the results show that multi-order hypergraph convolutional networks combining inter-order attention mechanisms and self-supervised learning can fully exploit the structural information of the hypergraph. The standard deviation results on multiple split sets also demonstrate that MO-HGCN achieves similar stability as the model that incorporates hypergraph and GNN methods.

Ablation study

We report in Fig. 4 the performance of different channels and different components of MO-HGCN as a way to investigate the contribution. As shown in Fig. 4, the \({\textbf {1-order}}\) of horizontal axis is a 1-order approximation of the hypergraph convolution, which is compared with the higher order channels. The \({\textbf {2-order}}\) and \({\textbf {3-order}}\) denote hypergraph convolutional networks using only the 2-order channel and the 3-order channel, respectively. The \({\textbf {Multi-order}}\) represents the MO-HGCN only consisting of 2-order and 3-order channels. The \({\textbf {Inter-order attention}}\) denotes a multi-order hypergraph convolutional network that involves inter-order attention. The \({\textbf {Self-supervised}}\) represents a multi-order hypergraph network consisting of inter-order attention and self-supervised learning.

As can be observed from Fig. 4, the 2-order channel always performs better than the 3-order channel, while the 2-order channel outperforms the 1-order channel in most cases but is inferior to the 2-order channel. This also indicates that the long-range information in higher order neighborhoods is not always directly applicable. The results of the \({\textbf {Multi-order}}\) show that the fusion of multi-order information allows models to learn multiple levels of representation, thus further improving the performance. The self-supervised learning component delivers a significant boost compared to inter-order attention, suggesting a more prominent role for extracting the distinctive information of higher order information. The results in Fig. 4 answer question \({\textbf {Q2}}\): Channels and components in the model contribute differently, with inter-order attention and self-supervised learning taking full advantage of the natural information brought by multi-order neighborhoods.

Effectiveness of inter-order attention

We use box plots in Fig. 5 to report the distribution of attention scores in the inter-order attention mechanism as a way to investigate the question \({\textbf {Q3}}\). The 2-order and 3-order in Fig. 5 represent the attention scores distributions between the 2-order channel and the enhanced information channel, and between the 3-order channel and the enhanced information channel, respectively.

As shown in Fig. 5, the distribution of attention scores generated by the inter-order attention mechanism is concentrated in the lower score region, which indicates a large discrepancy between the node embedding generated by the higher order channel and the enhanced information channel. This also verifies that the nodes are able to receive information from higher order neighborhoods. Higher order channel node embeddings that are more similar to those of the enhanced information channel are assigned higher scores. As a result, information with low similarity to 1-order neighbors (containing the node’s information) has a lower weight in the fusion of embeddings.

Effectiveness of self-supervised learning

We conduct parameter sensitivity experiments on the hyperparameters of the self-supervised learning, i.e., problem \({\textbf {Q4}}\). For \(\eta _1\) and \(\eta _2\), which were chosen in the range. Figure 5 reports the node classification accuracy of MO-HGCN for different \(\eta _1\) and \(\eta _2\) ranges. Figure 5 shows the stable performance of the MO-HGCN when \(\eta _1\) and \(\eta _2\) are chosen in a suitable range. Moreover, since self-supervised learning as an auxiliary task, \(\eta _1\) and \(\eta _2\) contribute more to the performance of the MO-HGCN at smaller valuesb (Fig. 6).

To investigate the impact of self-supervised learning on the multi-order hypergraph convolutional network, i.e., question \({\textbf {Q5}}\), we visualize the node embeddings generated by HyperGCN, HGNN, MO-HGCN without self-supervised learning, and MO-HGCN, respectively. As shown in Fig. 7, we use T-SNE [35] to reduce the dimension of the node embeddings and perform projection of 2D coordinates to draw clusters, with each color representing a different class of nodes, respectively. To produce clear distributions, we test on the Cora and Citesseer datasets with a small number of nodes and report the clustering coefficients in Table 3.

In Fig. 7, the MO-HGCN produces clear clusters as a result of the node embeddings, and it also produces higher contour coefficients in Table 3 than the MO-HGCN without self-supervised learning. This shows that self-supervised learning helps the node embeddings to learn distinctive information, which enables the separation of node embeddings to be improved.

Conclusions

In this paper, we propose a Multi-Order Hypergraph Convolutional Network incorporating self-supervised learning (MO-HGCN) to explore the potential of hypergraphs on higher order neighborhoods. MO-HGCN consists of a multi-channel network structure, where the higher order channels are composed of spectral 2-order and spectral 3-order hypergraph convolution operators, respectively. Through inter-order attention, we design an enhanced information channel that preserves low-order neighborhood information. To mine distinctive information in the higher order channels, we introduce self-supervised learning as an auxiliary task to enhance the performance of MO-HGCN. Experiments show that MO-HGCN is competitive with state-of-the-art baselines, and MO-HGCN develops the potential of higher order neighborhoods through inter-order attention and self-supervised learning components. In future work, we would like to explore hypergraphs with heterogeneous nodes to investigate higher order neighborhood problems on heterogeneous hypergraphs.

Table 3 Clustering coefficients on co-citation and co-authorship datasets