Abstract
Hypergraphs, as a powerful representation of information, effectively and naturally depict complex and non-pair-wise relationships in the real world. Hypergraph representation learning is useful for exploring complex relationships implicit in hypergraphs. However, most methods focus on the 1-order neighborhoods and ignore the higher order neighborhood relationships among data on the hypergraph structure. These often result in underutilization of hypergraph structure. In this paper, we exploit the potential of higher order neighborhoods in hypergraphs for representation and propose a Multi-Order Hypergraph Convolutional Network Integrated with Self-supervised Learning. We first encode the multi-channel network of the hypergraph by a high-order spectral convolution operator that captures the multi-order representation of nodes. Then, we introduce an inter-order attention mechanism to preserve the low-order neighborhood information. Finally, to extract valid embedding in the higher order neighborhoods, we incorporate a self-supervised learning strategy based on maximizing mutual information in the multi-order hypergraph convolutional network. Experiments on several hypergraph datasets show that the proposed model is competitive with state-of-the-art baselines, and ablation studies show the effectiveness of higher order neighborhood development, the inter-order attention mechanism, and the self-supervised learning strategy.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
Introduction
Hypergraphs [7] provide a natural way to model complex patterns of component connectivity in the real world. In comparison to graphs, hypergraphs can connect non-pair-wise relations, a pattern that contains more information. With the development of deep network-based learning methods, hypergraphs have been widely applied in many domains, including pose estimation [16, 29] and brain state classification [6, 32].
Recently, researchers propose several hypergraph-based neural network frameworks [2, 12, 19, 41, 45, 46]. Most of these methods focus on hypergraph expansion or on extending different network structures. For the Hypergraph Neural Network [12] (HGNN), message propagates by a hypergraph Laplacian operator on a clique expansion hypergraph, and follows a node-hyperedge-node propagation strategy. HyperGCN [41] approximates hyperedges as pair-wise edges, and thus, the hypergraph learning problem is converted to a graph learning problem. Moreover, unified framework [19, 46] of hypergraphs and graphs is emerger as a trend in recent times. Generally, these methods are designed by message passing process, making nodes and hyperedges confine to 1-order neighborhood in single propagation. However, nodes and hyperedges with the same attributes do not exist only in the 1-order neighborhood. For example, multiple authors co-authored a paper, which can be considered as a node, and multiple papers containing the same author are connected to a hyperedge. As shown in Fig. 1, Paper P2 in hyperedge 1 has two common author, and hyperedge 2 is a 1-order neighbor of hyperedge 1. Paper P3 in hyperedge 2 has two common author, and hyperedge 3 is a 2-order neighbor of hyperedge 2. Such a connection provides a way to reveal patterns of cross-domain collaboration. Therefore, hypergraphs serve as a powerful representation method for retaining information through deeper and more complex connectivity relationships. Furthermore, some works [1, 31] on graph learning focus on neighborhood expansion of adjacency matrix. Method [20] uses powers of the incidence matrix to obtain higher order relationships, but it cannot be adapted to hypergraphs with arbitrary hyperedge sizes. Specifically, a larger receptive field means that nodes may receive more performance-degrading noises. Although the higher order neighborhood encapsulates a rich representation, it also brings more challenges, and it remains an open problem to effectively extract valuable information from the complex higher order neighborhood of objects while maintaining the lower order neighborhood information.
To address the above challenges, we propose Multi-Order Hypergraph Convolutional Networks Integrated with Self-Supervised Learning (MO-HGCN), where the multi-order representation maintains in the manner of a multi-channel network. We first perform k-th expansions of Chebyshev polynomials for spectral convolution to obtain spectral 2-order and spectral 3-order hypergraph convolution operators. Specifically, the operators are constructed as independent hypergraph convolution layers and modeled as a 2-order channel and a 3-order channel, respectively. In addition, we adaptly adjust the nodes’ feature on the 1-order hypergraph convolution and utilize it as an enhanced information channel. Then, we propose an inter-order attention mechanism to learn the contrastive information among the different order neighborhoods. By assigning the attention scores to the node embedding of the higher order channel, the low-order neighborhood information is brought into focus. To extract valuable information of the higher order channels, we learn distinct representations in an self-supervised manner by incorporating maximizing mutual information-based contrastive learning. Finally, we fuse the node embedding learned from the 2-order and 3-order channels to represent the completed multi-order embeddings and optimize the weights of the network by joint learning. Compared with existing methods, MO-HGCN is a semi-supervised node classification model that combines self-supervised learning to obtain a multi-order neighborhood representation of nodes. The main contributions of this paper are as follows:
-
We propose a Multi-Order Hypergraph Convolutional Network Integrated with Self-Supervised Learning (MO-HGCN) to explicitly capture the complex relationships of higher order neighborhood by spectral high-order hypergraph convolution operators, and obtain a multi-order representation through a multi-channel network.
-
We propose an inter-order attention mechanism to maintain the information of low-order neighborhoods and learn the distinct representation of higher order neighborhoods by a mutual information maximization strategy in a self-supervised learning manner.
-
We conduct extensive experiments on several hypergraph datasets, and the results show the effectiveness of MO-HGCN compared with the state-of-the-art.
Related works
Hypergraph neural networks
In recent years, hypergraphs have gained attention among researchers, and representation learning methods based on hypergraph have been greatly developed. Feng et al. [12] propose the Hypergraph Neural Network (HGNN), a general framework which implements the message passing strategy on the hypergraph with a hyperedge convolutional layer. To avoid the limitations of inherent hypergraph structure, Jiang et al. [22] propose a dynamic hypergraph neural network that updates the hypergraph structure. Bai et al. [4] propose two trainable operators, namely Hypergraph Convolution and Hypergraph Attention, that can be extended and migrated in neural networks. Besides, some studies propose new hypergraph representation learning frameworks, such as HNHN [10], Hyper-SAGNN [45], HyperSAGE [2], and HGC-RNN [43].
In the exploration of hypergraph structure, HyperGCN [41] makes hypergraphs to be trained on graph convolutional networks by approximating hyperedges as pair-wise edges. Bandyopadhyay et al. [5] apply graph convolution on the line graph of the hypergraph to adapt variable-sized hyperedges. Yang et al. [42] treat the vertices and hyperedges equally to solve the symmetric information loss problem of data co-occurrence. Various types of practices [13] based on hypergraphs are also evolving, such as pose estimation [16, 29], link prediction [11], recommendation [23, 38, 39, 44], and brain state classification [6, 32].
A recent trend combining hypergraphs with graph network methods has emerged as a result of the advantages of data modeling brought by non-pair-wise relations in hypergraphs. Huang et al. [19] propose a framework for modeling the message passing process in graph and hypergraph neural networks. Zhang et al. [46] consider a hypergraph with edge-related vertex weight, propose the generic hypergraph spectral convolution networks (GHSC), and present various variants of hypergraph neural networks.
Self-supervised learning
Self-supervised learning [21, 24, 30] is currently receiving considerable attention in deep learning, serving downstream tasks by learning useful information in unlabeled data. Self-supervised learning has a wide range of applications in computer vision [3, 18], natural language processing [28], and graph learning [17, 25, 34, 37].
One popular approach on graph learning is mutual information maximization, i.e., global–local contrast. Hjelm et al. [18] introduce the application of mutual information maximization strategies on images by proposing Deep InfoMax (DMI). DMI is adapted to different downstream tasks by global–local contrast, e.g., local features are suitable for classification tasks. Veličković et al. [37] extend this paradigm to graph learning and propose the Deep Graph Infomax (DGI). DGI performs global and local neighborhood comparisons on graphs, enabling nodes to learn global and local structural information. InfoGraph [34] maximizes the mutual information between graph-level representations and substructured representations at different scales to learn the global graph representation. Rich representations are also learned from labeled and unlabeled datasets by semi-supervised learning.
The mutual information maximization strategy is extended to certain tasks on hypergraphs. Xia et al. [39] propose a dual channel hypergraph convolutional network, which employ self-supervised learning as an auxiliary task to enhance the performance of session recommendation. Yu et al. [44] use the higher order relations of hypergraphs to obtain complex relationships between users and compensates for the information loss due to multi-channel networks with multi-layer mutual information maximization. These works investigate the impact of mutual information maximization for different types of information, while our work explores the implications of mutual information maximization strategy in higher order neighborhood.
Method
In this section, we describe in detail the proposed Multi-Order Hypergraph Convolutional Network Integrated with Self-Supervised Learning (MO-HGCN). As shown in Fig. 2, MO-HGCN consists of a 2-order channel, a 3-order channel, and an enhanced information channel, with 2-order and 3-order channels as the outputs. Specifically, we design the spectral 2-order and 3-order hypergraph convolution operators to obtain the higher order information. Considering the importance of node features and the preservation of 1-order neighborhoods, we propose an inter-order attention mechanism in the multi-order hypergraph convolution network. Our goal is to fuse the multi-order information to obtain a multi-level representation of the nodes. We further introduce self-supervised learning on the hypergraph, i.e., mutual information maximization between different order to capture the distinctive higher order information of the nodes.
Preliminaries
Given a hypergraph \(\mathcal {G} = \left( \mathcal {V}, \mathcal {E}, \textbf{W} \right) \), \(\mathcal {V}\) is a vertex set \(\left\{ v_1, v_2, \ldots , v_n\right\} \) with n nodes, and \(\mathcal {E}\) is a hyperedge set \(\left\{ e_1, e_2, \ldots , e_m\right\} \) with m hyperedges. The hyperedge weight \(\textbf{W}\) represents a diagonal matrix which the hyperedge weight set \(\left\{ W_1, W_2, \ldots , W_m\right\} \) is the diagonal. Thus, hypergraph \(\mathcal {G}\) can be represented as an incidence matrix \(\textbf{H} \in \mathbb {R}^{|\mathcal {V}| \times |\mathcal {E}|}\), and the entries of \(\textbf{H}\) denote as
The Laplacian [47] \(\varvec{\Delta }\) of a hypergraph \(\mathcal {G}\) denotes as
where the \(\textbf{D}_v\) is a diagonal matrix of vertex degree and the \(\textbf{D}_e\) is a diagonal matrix of hyperedge degree.
Alternatively, given a hypergraph \(\mathcal {G} = \left( \mathcal {V}, \mathcal {E}, \textbf{W}, \varvec{\Delta } \right) \) with n nodes and m hyperedges, the hypergraph Laplacian \(\varvec{\Delta } \in \mathbb {R}^{n \times n}\) can be decomposed into an orthonormal eigen vectors \(\Phi \) and a non-negative eigenvalue diagonal matrix \(\Lambda \). For a hypergraph \(\mathcal {G}\) and a signal \(\varvec{x}\), the spectral convolution of a signal \(\varvec{x}\) on a filter \(\varvec{g}\) represents as
where the symbol \(\star \) represents the convolution operator, and \(\varvec{g}(\Lambda )\) indicates the Fourier coefficients. The function \(\varvec{g}(\Lambda )\) is further parameterized as K order polynomials which express as the truncated Chebyshev expansion. The Chebyshev polynomial is expanded as \(T_k(x) = 2xT_{k-1}(x) - T_{k-2}(x)\), and \(T_0(x)=1, T_1(x)=x\). With the truncated Chebyshev expansion, the spectral convolution approximatively represents as
where \(T_k(\tilde{\varvec{\Delta }})\) indicates the K order Chebyshev polynomial, and \(\tilde{\varvec{\Delta }} = \frac{2}{\lambda _{\max }} \varvec{\Delta } - \textbf{I}\) is a scaled Laplacian. Moreover, \(\lambda _{\max } \approx 2\) according to works [12, 26]. Therefore, the spectral 1-order hypergraph convolution can be defined as
Multi-order hypergraph convolutional network
On the basis of the spectral convolution theory of hypergraphs, we further develop an operator that facilitates nodes and hyperedges to interact in higher order neighborhoods.
For higher order hypergraph convolution, we extend the order of the neighborhoods from the perspective of spectral hypergraph convolution. According to the Chebyshev polynomial in Eq. (4), k is set to 2 to obtain the spectral 2-order hypergraph convolution operator as follows:
where \(\theta _0\), \(\theta _1\), and \(\theta _2\) denote the parameters of filters \(\varvec{g}\), and \(\varvec{\Theta } = \textbf{D}_v^{-1/2} \textbf{H} \textbf{W} \textbf{D}_e^{-1} \textbf{H}^T \textbf{D}_v^{-1/2}\). Following the way of works [12, 26] to avoid over-parameterization, we reduce multiple parameters to a single parameter which is assumed as
Thus, the spectral 2-order hypergraph convolution operator can be simplified as follows:
With the spectral 2-order hypergraph convolution operator, the 2-order hypergraph convolution of signal \(\textbf{X}\) can be defined as
where \(\theta \in \mathbb {R}^{C_\textrm{in} \times C_\textrm{out}}\) represents the learnable parameter.
Similarly, the spectral 3-order hypergraph convolution operator represents as
where \(\hat{\theta }_0\), \(\hat{\theta }_1\), \(\hat{\theta }_2\), and \(\hat{\theta }_3\) denote the parameters of filters \(\varvec{g}\). We also uses a single parameter \(\hat{\theta }\) to avoid over-parameterization as follows:
With the single parameter, the spectral 3-order hypergraph convolution operator can be simplified as
Thus, the 3-order hypergraph convolution of signal \(\textbf{X}\) can be defined as
where \(\theta \in \mathbb {R}^{C_\textrm{in} \times C_\textrm{out}}\) represents the learnable parameter.
In MO-HGCN, l in Eqs. (9) and (13) is set to 2. Therefore, the backbone network of MO-HGCN is a multi-channel network with two layers. This approach preserves the information of different order neighborhoods and facilitates the nodes to learn multi-order representations.
Inter-order attention
The spectral high-order hypergraph convolution operator allows each node to aggregate information from distal nodes and hyperedges. Such information may not be applicable to learn directly, which presents a challenge in regulating the involvement of low-order information. Therefore, we propose an inter-order attention mechanism to indicate the similarity between higher order neighborhoods and low-order neighborhoods. Unlike previous attentions, we focus on the comparison of attention between different orders of the same node, rather than among neighboring nodes. Particularly, we design an enhanced information channel based on 1-order hypergraph convolution that augments the nodes’ own information. The convolution process of this channel is as follows:
where \(\varvec{h}\) represents the enhanced node embedding, and the \(\beta \in \mathbb {R}^N\) denotes a learnable parameter that assigns different self-loop weights to each node.
As shown in Fig. 3, we obtain the node embeddings \(\varvec{z}^{o2}\) and \(\varvec{z}^{o3}\) of the 2-order and 3-order channels after the first layer of convolution. Then, the attention mechanism is applied as between \(\varvec{z}^{o2}\) and \(\varvec{h}\) , and between \(\varvec{z}^{o3}\) and \(\varvec{h}\), respectively.
The attention scores of the nodes embeddings \(\varvec{z}_{i}^{l} \in \mathbb {R}^K\) and \(\varvec{z}_{i}^{h}\in \mathbb {R}^K\) in the j-th dimensional feature for the low-order channel and the higher order channel are calculated by Eq. 15
where \(\varvec{z}^{l}_{ij}\) and \(\varvec{z}^{h}_{ij}\) represent the j-th dimensional feature of node i, \(MLP(\bullet ):\mathbb {R}^{2 \times K} \rightarrow \mathbb {R}^K\) is the feature mapping function which set as a multi-layer perceptron, and \(\Vert \) denotes the concatenation operation. Therefore, the attention score \(\alpha _{o2}\) between 2-order channel and enhanced information channel calculated as
The attention score \(\alpha _{o3}\) between 3-order channel and enhanced information channel calculated as
The attention scores \(\alpha _{o2}\) and \(\alpha _{o3}\) are further assigned to the higher order node embedding \(\varvec{z}^{o2}\) and \(\varvec{z}^{o3}\) as a way to enhance the most relevant node representation between channels, and the processes are represented as
To preserve the original higher order message of node embedding, we fuse node embeddings \(\varvec{\hat{z}}^{o2}\) and \(\varvec{\hat{z}}^{o3}\) with node embeddings \(\varvec{z}^{o2}\) and \(\varvec{z}^{o3}\), respectively. The final obtained higher order channel node embedding \(\varvec{\hat{z}}^{o2}\) and \(\varvec{\hat{z}}^{o3}\) are as follows:
where the \(\lambda \in \mathbb {R}^1\) and \(\mu \in \mathbb {R}^1\) denote the learnable parameter restricted to [0,1], controlling the involvement of two node embeddings.
Self-supervised learning auxiliary task
Multi-order hypergraph convolutional networks enable nodes to learn multiple levels of representations, further improving model performance. However, the multi-channel structure is independent of each other, and the higher order information usually contains varying degrees of redundant information. Therefore, it is worth considering how to extract the distinctive message from the multi-order hypergraph convolutional network. Inspired by mutual information maximization could improve the Deep Graph Infomax (DGI) [37] performance. We extend the mutual information maximization to the inter-order to guide the model to reduce feature redundancy. Specifically, we construct a contrastive learning between the enhanced information channel and the higher order channel, respectively. For the 2-order channel, the positive sample pair is \(\left( \varvec{h}_{ij}, \varvec{\tilde{z}}^{o2}_{ij}\right) \) and the negative sample pair is \(\left( \varvec{\hat{h}}_{ij}, \varvec{\tilde{z}}^{o2}_{ij}\right) \), where \(\varvec{\hat{h}}_{ij}\) denotes the negative sample with row-wise shuffling. We utilize InfoNCE [18] as the loss function for contrastive learning, as follows:
where \(\mathcal {S}\left( \bullet \odot \bullet \right) \) denotes the discriminator function as the dot product.
For the 3-order channel, the positive sample pair is \(\left( \varvec{h}_{ij}, \varvec{\tilde{z}}^{o3}_{ij}\right) \) and the negative sample pair is \(\left( \varvec{\hat{h}}_{ij}, \varvec{\tilde{z}}^{o3}_{ij}\right) \), where \(\varvec{\hat{h}}_{ij}\) denotes the negative sample with row-wise shuffling. The InfoNCE loss function \(\mathcal {L}_{s2}\) is defined as
Model learning
The node embedding \(\varvec{\tilde{z}}^{o2}\) and \(\varvec{\tilde{z}}^{o3}\) input to the second layer of multi-order hypergraph neural network. The outputs of second layer are denoted as \(\varvec{X}^{(2)} \in \mathbb {R}^{N\times q}\), and \(\varvec{X}^{(3)}\in \mathbb {R}^{N\times q}\), where q is the number of classes. To conduct the node classification, we adopt the summation strategy to achieve the fusion of multi-channel information by Eq. (24)
Then, we adopt the softmax function to predict the label \(\hat{Y}\) by \(\varvec{\hat{X}}\). Thus, the cross-entropy loss function for node classification is defined as follows:
where the \(Y_{ij}\) denotes the true labels of \(\mathcal {V}_L\).
Therefore, the joint learning loss function is as follows:
where \(\eta _1\) and \(\eta _2\) are hyperparameters that control the participation of self-supervised learning. Algorithm 1 reports the overall process of MO-HGCN.
Experiments
In this section, we conduct experiments and validate our model by answering the following questions.
-
Q1: How does MO-HGCN perform on the node classification task?
-
Q2: How does high-order spectral hypergraph convolution perform compared to 1-order spectral hypergraph convolution?
-
Q3: How does the inter-order attention mechanism contribute to the performance of MO-HGCN?
-
Q4: How sensitive is the performance of MO-HGCN to its parameter settings?
-
Q5: How does the self-supervised learning component affect the effectiveness of MO-HGCN?
Datasets
For the semi-supervised node classification task on hypergraphs, we use the five hypergraph datasets provided by HyperGCN [41] for validation. These datasets include the co-citation network and the co-authorship network. Summary of the datasets is shown in Table 1 and details are as follows:
-
Co-citation datasets: The original sources of the co-citation network hypergraph dataset are cora, citeseer, and PubMed. In the hypergraph construction, all documents are created as nodes, and documents cited by the same document are grouped as a hyperedge. Hyperedges containing only one node are removed and the node feature is the bag-of-words vector of documents.
-
Co-authorship datasets: The original sources of the co-authorship network hypergraph dataset are DBLP and cora. In the hypergraph construction, all papers are considered as nodes and papers authored by a author are grouped as a hyperedge. The nodes are characterized by the bag-of-words vector of papers.
Baselines
We compare the proposed method with state-of-the-art baselines that include a variety of hypergraph neural networks combined with different neural network models. Details of these approach are as follows:
-
MLP+HLR [2]: The method is a multi-layer perceptron using explicit hypergraph Laplacian for regularization.
-
HyperGCN [2]: HyperGCN approximates the hypergraph learning problem to a graph problem by pair-wise edges and provides a variant FastHyperGCN that reduces the training time.
-
HyperSAGE [2]: HyperSAGE utilizes a two-level neural messaging strategy to propagate information in the hypergraph and combines different neighborhood aggregation approaches of GraphSAGE [15].
-
HGNN [12]: HGNN introduces the symmetric normalized hypergraph Laplacian [47] operator by means of spectral hypergraph theory and provides a general framework for hypergraph neural networks.
-
UniGCN [19]: UniGNN unifies the message passing process of graphs and hypergraphs into a framework, extending the Graph Neural Networks design naturally to hypergraphs.
-
UniGAT [19]: The method extends the aggregation process of Graph Attention Networks [36] to hypergraphs, so that nodes learn the attention weights of neighboring hyperedges.
-
UniGIN [19]: The method uses the mechanism of Graph Isomorphic Networks [40] to enhance the expressiveness by aggregating the information of neighboring hyperedges by nodes.
-
UniSAGE [19]: The method is a variant of GraphSAGE [15], which adapts to different tasks by means of different aggregation functions.
-
H-ChebNet [46]: Combined with ChebNet [9], a variant derived on the general hypergraph spectral convolution framework.
-
H-APPNP [46]: H-APPNP is a hypergraph convolutional network with APPNP [27] as the backbone network.
-
H-SSGC [46]: The method extends the SSGC [48] to a general hypergraph spectral convolution framework.
-
H-GCN [46]: H-GCN is a general hypergraph spectral convolution framework with Graph Convolutional Networks [26] as the backbone network.
-
H-GCNII [46]: H-GCNII extends the GCNII [8] to a general hypergraph spectral convolution framework, which is a deep network structures.
Experiments settings
For the semi-supervised node classification task, we use ACC (Accuracy) to evaluate the performance of the model. In the experimental setup, we utilize the Adma algorithm for training and set it as 2000 epochs. For the Cora (including Co-citation and Co-authorship) and Citeseer datasets, the learning rate is 0.005 and L2 regularization is 0.05. For the DBLP and Pubmed datasets, the learning rate is 0.05 and L2 regularization is 0.002. For the hyperparameters \(\eta _1\) and \(\eta _2\), the Cora (including Co-citation and Co-authorship) are set to 0.005 and 0.005, respectively, while other datasets used are set to 0.001. Each dataset has ten different split training-test sets with consistent training-test ratios. We follow the way of work [19] to test the datasets. For the baselines, we cite the experimental results reported in the original paper, since the compared datasets and evaluation metric are consistent.
Experimental results
Performance analysis
We report the mean accuracy and standard deviation of the experimental results in Table 2, with the best results in bold and the second best results underlined. The experimental results show the advantage of our model in terms of its accuracy compared with the state-of-the-art, with improvements of 2.0%, 4.0%, 3.1%, and 1.0% on the co-citation cora, citeseer, PubMed, and co-authorship cora datasets, respectively. The best-performing method on the co-authorship DBLP dataset is H-GCNII. The experimental results in Table 2 answer the question \({\textbf {Q1}}\): The proposed model outperforms the state-of-the-art baselines.
Comparatively to the pair-wise edge approximation of HyperGCN, MO-HGCN utilizes clique expansion to approximate the hypergraph structure as HGNN, and the multi-order approximation neighborhood further enlarges the receptive field of nodes and hyperedges. It is for this reason that MO-HGCN performs better than HyperGCN and HGNN. Compared to models combining the hypergraph with other GNN methods, although these models absorb the advantages of different GNN structures and perform well, the results show that multi-order hypergraph convolutional networks combining inter-order attention mechanisms and self-supervised learning can fully exploit the structural information of the hypergraph. The standard deviation results on multiple split sets also demonstrate that MO-HGCN achieves similar stability as the model that incorporates hypergraph and GNN methods.
Ablation study
We report in Fig. 4 the performance of different channels and different components of MO-HGCN as a way to investigate the contribution. As shown in Fig. 4, the \({\textbf {1-order}}\) of horizontal axis is a 1-order approximation of the hypergraph convolution, which is compared with the higher order channels. The \({\textbf {2-order}}\) and \({\textbf {3-order}}\) denote hypergraph convolutional networks using only the 2-order channel and the 3-order channel, respectively. The \({\textbf {Multi-order}}\) represents the MO-HGCN only consisting of 2-order and 3-order channels. The \({\textbf {Inter-order attention}}\) denotes a multi-order hypergraph convolutional network that involves inter-order attention. The \({\textbf {Self-supervised}}\) represents a multi-order hypergraph network consisting of inter-order attention and self-supervised learning.
As can be observed from Fig. 4, the 2-order channel always performs better than the 3-order channel, while the 2-order channel outperforms the 1-order channel in most cases but is inferior to the 2-order channel. This also indicates that the long-range information in higher order neighborhoods is not always directly applicable. The results of the \({\textbf {Multi-order}}\) show that the fusion of multi-order information allows models to learn multiple levels of representation, thus further improving the performance. The self-supervised learning component delivers a significant boost compared to inter-order attention, suggesting a more prominent role for extracting the distinctive information of higher order information. The results in Fig. 4 answer question \({\textbf {Q2}}\): Channels and components in the model contribute differently, with inter-order attention and self-supervised learning taking full advantage of the natural information brought by multi-order neighborhoods.
Effectiveness of inter-order attention
We use box plots in Fig. 5 to report the distribution of attention scores in the inter-order attention mechanism as a way to investigate the question \({\textbf {Q3}}\). The 2-order and 3-order in Fig. 5 represent the attention scores distributions between the 2-order channel and the enhanced information channel, and between the 3-order channel and the enhanced information channel, respectively.
As shown in Fig. 5, the distribution of attention scores generated by the inter-order attention mechanism is concentrated in the lower score region, which indicates a large discrepancy between the node embedding generated by the higher order channel and the enhanced information channel. This also verifies that the nodes are able to receive information from higher order neighborhoods. Higher order channel node embeddings that are more similar to those of the enhanced information channel are assigned higher scores. As a result, information with low similarity to 1-order neighbors (containing the node’s information) has a lower weight in the fusion of embeddings.
Effectiveness of self-supervised learning
We conduct parameter sensitivity experiments on the hyperparameters of the self-supervised learning, i.e., problem \({\textbf {Q4}}\). For \(\eta _1\) and \(\eta _2\), which were chosen in the range. Figure 5 reports the node classification accuracy of MO-HGCN for different \(\eta _1\) and \(\eta _2\) ranges. Figure 5 shows the stable performance of the MO-HGCN when \(\eta _1\) and \(\eta _2\) are chosen in a suitable range. Moreover, since self-supervised learning as an auxiliary task, \(\eta _1\) and \(\eta _2\) contribute more to the performance of the MO-HGCN at smaller valuesb (Fig. 6).
To investigate the impact of self-supervised learning on the multi-order hypergraph convolutional network, i.e., question \({\textbf {Q5}}\), we visualize the node embeddings generated by HyperGCN, HGNN, MO-HGCN without self-supervised learning, and MO-HGCN, respectively. As shown in Fig. 7, we use T-SNE [35] to reduce the dimension of the node embeddings and perform projection of 2D coordinates to draw clusters, with each color representing a different class of nodes, respectively. To produce clear distributions, we test on the Cora and Citesseer datasets with a small number of nodes and report the clustering coefficients in Table 3.
In Fig. 7, the MO-HGCN produces clear clusters as a result of the node embeddings, and it also produces higher contour coefficients in Table 3 than the MO-HGCN without self-supervised learning. This shows that self-supervised learning helps the node embeddings to learn distinctive information, which enables the separation of node embeddings to be improved.
Conclusions
In this paper, we propose a Multi-Order Hypergraph Convolutional Network incorporating self-supervised learning (MO-HGCN) to explore the potential of hypergraphs on higher order neighborhoods. MO-HGCN consists of a multi-channel network structure, where the higher order channels are composed of spectral 2-order and spectral 3-order hypergraph convolution operators, respectively. Through inter-order attention, we design an enhanced information channel that preserves low-order neighborhood information. To mine distinctive information in the higher order channels, we introduce self-supervised learning as an auxiliary task to enhance the performance of MO-HGCN. Experiments show that MO-HGCN is competitive with state-of-the-art baselines, and MO-HGCN develops the potential of higher order neighborhoods through inter-order attention and self-supervised learning components. In future work, we would like to explore hypergraphs with heterogeneous nodes to investigate higher order neighborhood problems on heterogeneous hypergraphs.
References
Abu-El-Haija S, Perozzi B, Kapoor A, Alipourfard N, Lerman K, Harutyunyan H, Ver Steeg G, Galstyan A (2019) Mixhop: Higher-order graph convolutional architectures via sparsified neighborhood mixing. In: International conference on machine learning. PMLR, pp 21–29
Arya D, Gupta DK, Rudinac S, Worring M (2020) Hypersage: generalizing inductive representation learning on hypergraphs. arXiv preprint arXiv:2010.04558
Bachman P, Hjelm RD, Buchwalter W (2019) Learning representations by maximizing mutual information across views. Adv Neural Inf Process Syst 32
Bai S, Zhang F, Torr PH (2021) Hypergraph convolution and hypergraph attention. Pattern Recogn 110:107637
Bandyopadhyay S, Das K, Murty MN (2020) Line hypergraph convolution network: Applying graph convolution for hypergraphs. arXiv preprint arXiv:2002.03392
Banka A, Buzi I, Rekik I (2020) Multi-view brain hyperconnectome autoencoder for brain state classification. In: International workshop on predictive intelligence in medicine. Springer, pp 101–110
Bretto A (2013) Hypergraph theory. An introduction. Mathematical engineering. Springer, Cham
Chen M, Wei Z, Huang Z, Ding B, Li Y (2020) Simple and deep graph convolutional networks. In: International conference on machine learning. PMLR, pp 1725–1735
Defferrard M, Bresson X, Vandergheynst P (2016) Convolutional neural networks on graphs with fast localized spectral filtering. Adv Neural Inf Process Syst 29
Dong Y, Sawin W, Bengio Y (2020) Hnhn: hypergraph networks with hyperedge neurons. arXiv preprint arXiv:2006.12278
Fan H, Zhang F, Wei Y, Li Z, Zou C, Gao Y, Dai Q (2021) Heterogeneous hypergraph variational autoencoder for link prediction. IEEE Trans Pattern Anal Mach Intell
Feng Y, You H, Zhang Z, Ji R, Gao Y (2019) Hypergraph neural networks. In: Proceedings of the AAAI conference on artificial intelligence, vol 33, pp 3558–3565
Gao Y, Zhang Z, Lin H, Zhao X, Du S, Zou C (2020) Hypergraph learning: methods and practices. IEEE Trans Pattern Anal Mach Intell
Georgiev D, Brockschmidt M, Allamanis M (2022) Heat: hyperedge attention networks. arXiv preprint arXiv:2201.12113
Hamilton W, Ying Z, Leskovec J (2017) Inductive representation learning on large graphs. Adv Neural Inf Process Syst 30
Hao X, Li J, Guo Y, Jiang T, Yu M (2021) Hypergraph neural network for skeleton-based action recognition. IEEE Trans Image Process 30:2263–2275
Hassani K, Khasahmadi AH (2020) Contrastive multi-view representation learning on graphs. In: International conference on machine learning. PMLR, pp 4116–4126
Hjelm R.D, Fedorov A, Lavoie-Marchildon S, Grewal K, Bachman P, Trischler A, Bengio Y (2018) Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670
Huang J, Yang J (2021) Unignn: a unified framework for graph and hypergraph neural networks. arXiv preprint arXiv:2105.00956
Huang J, Lei F, Wang S, Wang S, Dai Q (2021) Hypergraph convolutional network with hybrid higher-order neighbors. In: Chinese conference on pattern recognition and computer vision (PRCV). Springer, pp 103–114
Jaiswal A, Babu AR, Zadeh MZ, Banerjee D, Makedon F (2020) A survey on contrastive self-supervised learning. Technologies 9(1):2
Jiang J, Wei Y, Feng Y, Cao J, Gao Y (2019) Dynamic hypergraph neural networks. In: IJCAI, pp 2635–2641
Jia R, Zhou X, Dong L, Pan S (2021) Hypergraph convolutional network for group recommendation. In: 2021 IEEE international conference on data mining (ICDM). IEEE, pp 260–269
Jing L, Tian Y (2020) Self-supervised visual feature learning with deep neural networks: a survey. IEEE Trans Pattern Anal Mach Intell 43(11):4037–4058
Jing B, Park C, Tong H (2021) Hdmi: high-order deep multiplex infomax. In: Proceedings of the web conference 2021, pp 2414–2424
Kipf TN, Welling M (2016) Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907
Klicpera J, Bojchevski A, Günnemann S (2018) Predict then propagate: graph neural networks meet personalized pagerank. arXiv preprint arXiv:1810.05997
Kong L, d’Autume C.d.M, Ling W, Yu L, Dai Z, Yogatama D (2019) A mutual information maximization perspective of language representation learning. arXiv preprint arXiv:1910.08350
Liu S, Lv P, Zhang Y, Fu J, Cheng J, Li W, Zhou B, Xu M (2020) Semi-dynamic hypergraph neural network for 3d pose estimation. In: IJCAI, pp 782–788
Liu X, Zhang F, Hou Z, Mian L, Wang Z, Zhang J, Tang J (2021) Self-supervised learning: generative or contrastive. IEEE Trans Knowl Data Eng
Liu X, Lei F, Xia G, Zhang Y, Wei W (2022) Adjmix: simplifying and attending graph convolutional networks. Complex Intell Syst 8(2):1005–1014
Lostar M, Rekik I (2020) Deep hypergraph u-net for brain graph embedding and classification. arXiv preprint arXiv:2008.13118
Payne J (2019) Deep hyperedges: a framework for transductive and inductive learning on hypergraphs. arXiv preprint arXiv:1910.02633
Sun FY, Hoffmann J, Verma V, Tang J (2019) Infograph: unsupervised and semi-supervised graph-level representation learning via mutual information maximization. arXiv preprint arXiv:1908.01000
Van der Maaten L, Hinton G (2008) Visualizing data using t-sne. J Mach Learn Res 9(11)
Veličković P, Cucurull G, Casanova A, Romero A, Lio P, Bengio Y (2017) Graph attention networks. arXiv preprint arXiv:1710.10903
Veličković P, Fedus W, Hamilton W.L, Liò P, Bengio Y, Hjelm RD (2018) Deep graph infomax. arXiv preprint arXiv:1809.10341
Wang J, Ding K, Zhu Z, Caverlee J (2021) Session-based recommendation with hypergraph attention networks. In: Proceedings of the 2021 SIAM international conference on data mining (SDM). SIAM, pp 82–90
Xia X, Yin H, Yu J, Wang Q, Cui L, Zhang X (2021) Self-supervised hypergraph convolutional networks for session-based recommendation. In: Proceedings of the AAAI conference on artificial intelligence, vol 35, pp 4503–4511
Xu K, Hu W, Leskovec J, Jegelka S (2018) How powerful are graph neural networks? arXiv preprint arXiv:1810.00826
Yadati N, Nimishakavi M, Yadav P, Nitin V, Louis A, Talukdar P (2019) Hypergcn: a new method for training graph convolutional networks on hypergraphs. Adv Neural Inf Process Systems 32
Yang C, Wang R, Yao S, Abdelzaher T (2020) Hypergraph learning with line expansion. arXiv preprint arXiv:2005.04843
Yi J, Park J (2020) Hypergraph convolutional recurrent neural network. In: Proceedings of the 26th ACM SIGKDD international conference on knowledge discovery & data mining, pp 3366–3376
Yu J, Yin H, Li J, Wang Q, Hung NQV, Zhang X (2021) Self-supervised multi-channel hypergraph convolutional network for social recommendation. In: Proceedings of the web conference 2021, pp 413–424
Zhang R, Zou Y, Ma J (2019) Hyper-sagnn: a self-attention based graph neural network for hypergraphs. arXiv preprint arXiv:1911.02613
Zhang J, Li F, Xiao X, Xu T, Rong Y, Huang J, Bian Y (2022) Hypergraph convolutional networks via equivalency between hypergraphs and undirected graphs. arXiv preprint arXiv:2203.16939
Zhou D, Huang J, Schölkopf B (2006) Learning with hypergraphs: clustering, classification, and embedding. Adv Neural Inf Process Syst 19
Zhu H, Koniusz P (2020) Simple spectral graph convolution. In: International conference on learning representations
Acknowledgements
This work was partly supported by the National Natural Science Foundation of China (U1701266), the Guangdong Provincial Key Laboratory Project of Intellectual Property and Big Data (2018B030322016), Special Projects for Key Fields in Higher Education of Guangdong, China (2020ZDZX3077, 2022ZDJS013 ), the Natural Science Foundation of Guangdong Province, China (2022A1515011146), Key Field R &D Plan Project of Guanzhou (202206070003), and the Youth Innovation Project of the Department of Education of Guangdong Province, China (File No. 2020KQNCX040).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Huang, J., Lei, F., Jiang, J. et al. Multi-order hypergraph convolutional networks integrated with self-supervised learning. Complex Intell. Syst. 9, 4389–4401 (2023). https://doi.org/10.1007/s40747-022-00964-7
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s40747-022-00964-7