8.1 Introduction

Hypergraph computation has been used in many areas such as image analysis [1,2,3] and recommendation [4,5,6]. In practical applications, the hypergraph may not be in a small scale, and the size of the hypergraph could be very large in many cases, where hypergraph computation confronts the complexity issues [7,8,9,10,11,12,13]. For instance, in medical image analysis, hypergraphs can be used to model the relationship among case patches within an image or different images. Here we take the gigapixel whole-slide histopathological images (WSIs) as an example. The large scale of pixels in WSIs leads to a great challenge for medical image analysis. If we generate a hypergraph for such pixels in WSIs, the scale of vertices tends to be in billion level. Even we sample patches in WSIs, this number can be still around tens of thousands, or in million level. The conventional hypergraph modeling methods are highly unlikely able to analyze such large scale pixels. Another example is the recommender system. In recommender system, graphs or hypergraphs have been very widely used with their superior structural modeling capabilities. Meanwhile, the number of uses and items in the Internet or the recommender systems can be in million to billions level, and even keep increasing. Consequently, recommender systems are one of the typical playgrounds for large scale hypergraph applications. The large scale problem of hypergraphs is encountered in many other areas, such as social network analysis, protein relations prediction, and so on.

Under such circumstances, hypergraph computation confronts the large scale issue, as the modeling and computing on hypergraph are with high complexity in general. To help solve the large scale problem, we introduce two types of hypergraph computational methods to handle large scale data in this chapter, namely the factorization-based hypergraph reduction method and hierarchical hypergraph computation method. We also introduce their applications in medical image analysis and recommender systems, respectively. The factorization-based hypergraph reduction reduces the large scale hypergraph incidence matrix H to two low-dimensional matrices, leading to the reduction of the complexity. This method can support the hypergraph computation with tens of thousands vertices. The other method, i.e., the hierarchical hypergraph computation, splits the vertices to several subsets and computes each sub-hypergraph, respectively. The results from these sub-hypergraphs can be further combined following a hierarchical strategy. This method can support the hypergraph computation with millions of vertices and hyperedges. Part of the work introduced in this chapter has been published in [8].

8.2 Factorization-Based Big-Hypergraph Modeling

The complexity of the incident matrix \(\mathbf {H} \in \mathbb {R}^{N \times E}\) is \(\mathbb {O}(N^2)\), which rises rapidly with respect to the increasing of the number of vertices (\(|\mathbb {V}| = N\)) and the number of hyperedges (\(|\mathbb {E}| = N\)). Although hypergraphs can model high-order complex associations well, the incidence matrix cannot take up a sizable number of vertices in traditional hypergraph modeling and transductive computation strategy. This is one typical bottleneck that limits the applications of hypergraph computation. To address this problem, the factorization-based hypergraph reduction method [8] is introduced to handle hypergraph modeling and computing with tens of thousands vertices.

It is an effective way to reduce dimensionality by conducting matrix decomposition of matrices with high dimensionality into the product of matrices with small dimensionality and has been applied in different areas such as spectral clustering [14] and recommendation algorithms [15]. For a large-dimensional incidence matrix H for a hypergraph, matrix decomposition can also be used to find the low-dimensional embeddings of each vertex and hyperedge and support large scale hypergraph computation.

As illustrated in Fig. 8.1, the factorization-based hypergraph reduction incorporates a factor embedding component that encodes the relationships between hyperedges and vertices into two latent semantic spaces. Due to the low dimension of the latent semantic space, it can handle more vertices and hyperedges accordingly.

Fig. 8.1
A flow diagram. A 3 by 3 incidence matrix with vertices V a, b, and c, and hyperedges E i, j, and k, converts to a hypergraph with 8 vertices and 3 hyperedges. 2 latent semantic spaces, V subscript c with E i and j, and, E i with V a, b, and asterisk gives 2 3 by 4 matrices of V with 6 and 4 values.

The pipeline of the factorization-based hypergraph reduction method. This figure is from [8]

The purpose of factorization is to reduce the dimension of the incident matrix H to two semantic spaces, including vertex-belonging hyperedge represented by \({\mathbf {H}}_{v \in \mathbb {E}_v} \in \mathbb {R}^{N \times \varphi }\) and hyperedge-containing-vertices represented by \({\mathbf {H}}_{e \supset \mathbb {V}_e} \in \mathbb {R}^{E \times \varphi }\), where \(\mathbb {E}_v\) and \(\mathbb {V}_e\) represent the hyperedge set containing vertex v and vertex set in hyperedge e, respectively, and φ is a hyperparameter that represents the latent semantic space dimension. Figure 8.1 illustrates that the two latent semantic spaces aim to express all connections between vertices and hyperedges. This procedure is formulated as below:

$$\displaystyle \begin{aligned} \arg \underset{{\mathbf{H}}_{v \in \mathbb{E}_v}, {\mathbf{H}}_{e \supset \mathbb{V}_e}}{ \min } \Big \{|| \mathbf{H} - {\mathbf{H}}_{v \in \mathbb{E}_v} {\mathbf{H}}_{e \supset \mathbb{V}_e}^\top ||{}_2^2 \Big \}. \end{aligned} $$
(8.1)

Consequently, the corresponding loss generated by the hypergraph dimensionality reduction can be written as

$$\displaystyle \begin{aligned} \mathbb{L}_\gamma= || \mathbf{H} - {\mathbf{H}}_{v \in \mathbb{E}_v} {\mathbf{H}}_{e \supset \mathbb{V}_e}^\top ||{}_2^2. \end{aligned} $$
(8.2)

The hypergraph Laplacian matrix L is another crucial component of hypergraph computation, with the ordinary form is \(\mathbf {L} = \mathbf {I}-{\mathbf {D}}_v^{-1/2}\mathbf {H} \mathbf {W} {\mathbf {D}}_e^{-1} {\mathbf {H}}^\top {\mathbf {D}}_v^{-1/2}\). Since the incident matrix H has two low-dimensional latent semantic spaces, the low-dimensional hypergraph factorization-based Laplacian L F is formulated as

$$\displaystyle \begin{aligned} {\mathbf{L}}_F = \mathbf{I}-{\mathbf{D}}_v^{-1/2}{\mathbf{H}}_{v \in \mathbb{E}_v} \underbrace{{\mathbf{H}}_{e \supset \mathbb{V}_e}^\top \mathbf{W} {\mathbf{D}}_e^{-1} {\mathbf{H}}_{e \supset \mathbb{V}_e}}_{\varSigma\in \mathbb{R}^{\varphi \times \varphi}} {\mathbf{H}}_{v \in \mathbb{E}_v}^\top {\mathbf{D}}_v^{-1/2}, \end{aligned} $$
(8.3)

where \(\varSigma ={\mathbf {H}}_{e \supset \mathbb {V}_e}^\top \mathbf {W} {\mathbf {D}}_e^{-1} {\mathbf {H}}_{e \supset \mathbb {V}_e}\) is an intermediate latent feature multiplication term of dimension φ. Because the latent semantic space dimension φ is significantly smaller than the total amount of vertices and hyperedges, the multiplication intermediate term Σ functions as an extended control coefficient matrix.

The factorization-based hypergraph reduction can be used in hypergraph neural networks to support large scale computation, which can be used for more than 10,000 vertices and hyperedges.

Here we illustrate an application of hypergraph computation for large scale medical image analysis using whole-slide histopathology images for survival prediction. The goal is to make predictions by extracting valid survival-specific features reflecting the survival status of a patient based on a whole section histopathology image. Unlike conventional images, WSI data can be very large, i.e., a single image may have billions of pixels, and the correlations of these data are very complicated. Therefore, hypergraph computation in this application meets the large scale issue. The existing medical image analysis models are designed for analyzing natural images with a much smaller size, such as 256px × 256px or more. In order for the model to handle these WSI data, a number of patches of a moderate size are usually sampled first. Some patches of a moderate size (e.g., 256 × 256) are extracted from each WSI, and then these patches are stacked up and fed into a CNN-based feature extractor (e.g., VGG) to generate a global representation, as shown in Fig. 8.2. Subsequently, a regression model is applied to the global features to predict the survival score. These methods have an obvious drawback that the structure of the entire histopathological image is broken into pieces by patch sampling.

Fig. 8.2
4 microslide scans. a. A slide with dark and light patches around an empty center. b. 3 local features from a section with dark and light patches are extracted. c. a hypergraph for 3 features with 8 vertices and 5 hyperedges. d. 3 hypergraphs with 8 vertices and 5, 4, and 7 hyperedges, each.

(a) The whole-slide image for survival prediction; (b) Local feature extraction with convolution networks; (c) Feature aggregation with pairwise relation; (d) Global feature representation with high-order relation and multiple spaces. This figure is from [1]

It may be unrealistic to extract all of the structural information at the cellular level from gigapixel images because there is an apparently massive amount of pixel data that are included in a single histopathological image. A small number of image patches can be selected to generate graph-based models. The global feature can be extracted by this method. However, the number of sampled patches limits the sampling area’s coverage to the original image’s informative regions, which causes a serious portion of fields with pathological features to be missing. The incident matrix, which represents the connectivity between vertices and hyperedges, is an essential component of the hypergraph neural network. The large scale vertices and hyperedges in the constructed hypergraph limit the application of HGNN [16].

Here, we introduce the Big-Hypergraph Factorization Neural Network (b-HGFN) [8], which uses factorization-based hypergraph reduction to address the above issue. It incorporates a factor embedding component that encodes the relationships between hyperedges and vertices into two latent semantic spaces, as illustrated in Fig. 8.3. Due to the low dimension of the latent semantic space, b-HGFN can handle more vertices and hyperedges. With the hypergraph reduction, b-HGFN can provide more accurate feature representations of histopathological images from more densely sampled patches. Consequently, the first loss generated by the hypergraph dimensionality reduction can be written as Eq. (8.2). The hypergraph Laplacian matrix L is another crucial component of b-HGFN, and the low-dimensional hypergraph factorization Laplacian L F is formulated as Eq. (8.3). A standard hypergraph neural network layer is represented as

$$\displaystyle \begin{aligned} \text{HGFConv}(\cdot)=D\Big[\sigma(\varTheta^{(\cdot)} {\mathbf{X}}^{(\cdot)}(\mathbf{I} - {\mathbf{L}}_F)) \Big ], \end{aligned} $$
(8.4)

where σ stands for the nonlinear activation function, and D represents the dropout layer. Convolution operations are embedded into the implicit latent semantic space by modifying the convolution network’s specifics, which are denoted as

$$\displaystyle \begin{aligned} \left \{ \begin{array}{ll} \text{HGFConv}(0)&=D\Big[\sigma(\varTheta^{(0)} {\mathbf{X}}^{(0)}{\mathbf{D}}_v^{-1/2}{\mathbf{H}}_{v \in \mathbb{E}_v}\varSigma) \Big ] \\ \text{HGFConv}(1)&=D[\sigma(\varTheta^{(1)} {\mathbf{X}}^{(1)}\varSigma) ] \\ \dots & \\ \text{HGFConv}(L-1)&=D[\sigma(\varTheta^{(L-1)} {\mathbf{X}}^{(L-1)}\varSigma) ]\\ \text{HGFConv}(L)&=D\Big[\sigma(\varTheta^{(L)} {\mathbf{X}}^{(L)}\varSigma {\mathbf{H}}_{v \in \mathbb{E}_v}^\top {\mathbf{D}}_v^{-1/2}) \Big] \end{array} \right. . \end{aligned} $$
(8.5)

According to the HGFConv mentioned above, the hypergraph’s high-dimensional connection relations can be embedded in the low-dimensional latent semantic spaces. To represent global features (i.e., \( \mathbf {X} \in \mathbb {R}^{1 \times C_{L+1}}\)) at the histopathological image level, the output of the last layer of HGFConv (i.e., X (L+1)) is squeezed by a pooling layer after a complete b-HGFN.

Fig. 8.3
A 3 by 3 incidence matrix of vertices V a, b, and c, and hyperedges E i, j, and k with elements 1 and 0, undergoes hypergraph reduction. Factor embedding into 2 latent semantic spaces formed by V c with E i and j, and E i with V a, b, and asterisk gives 2 3 by 4 matrices with 5 and 6 values.

An illustration of the Big-Hypergraph Factorization Neural Network (b-HGFN) that extracts global representation features from informative sampling patches. This figure is from [8]

The patient survival duration prediction is calculated using a fully connected neural network after obtaining the histopathological image’s feature representation. The hierarchical loss, which incorporates list-wise loss, pairwise loss, and point-wise loss, has been experimentally demonstrated to be more effective for b-HGFN than using the simply pairwise Bayesian Concordance Readjust (BCR) loss function. The point-wise loss function applies negative Cox log partial likelihood loss as

$$\displaystyle \begin{aligned} \mathbb{L}_\alpha=\sum \delta_i\left(-s_i+\log \sum_{j \in \{j:t_j\leq t_i\}}\exp (t_j)\right), \end{aligned} $$
(8.6)

where s i and t i represent the predicted duration and the truth, while the pairwise loss and list-wise loss refer to NDCGLoss2 derived by LambdaLoss [17] and BCR loss [2]. Taken into consideration the loss function of hypergraph dimension reduction, the combination of all loss functions can be expressed as

$$\displaystyle \begin{aligned} \left \{ \begin{array}{ll} \mathbb{L}_\lambda &= \lambda \mathbb{L}_\alpha +(1-\lambda) \mathbb{L}_\beta\\ \mathbb{L}_\beta &= \{\text{NDCGLoss2}(\mathbb{S}, \mathbb{G}), \text{BCRLoss}(\mathbb{S}, \mathbb{G}) \} \\ \mathbb{L}_{\text{all}} &= \mathbb{L}_\gamma + \mathbb{L}_\lambda \end{array} \right. . \end{aligned} $$
(8.7)

The factorization-based hypergraph reduction incorporates a factor embedding component that encodes the relationships between hyperedges and vertices into two latent semantic spaces. Due to the low dimension of the latent semantic space, it can handle more vertices and hyperedges. The factorization-based hypergraph reduction can be used in HGNN [16] to solve the large scale problem. The method can effectively solve the hypergraph analysis problem with almost 10,000 vertices and hyperedges.

8.3 Hierarchical Hypergraph Modeling

The factorization-based hypergraph reduction can effectively analyze the hypergraph with almost 10,000 vertices and hyperedges, while it stretches its limit when the size extends to hypergraph with millions of vertices or hyperedges. Figure 8.4 shows a hierarchical hypergraph learning method for large scale hypergraphs with hierarchical labels. The hierarchical hypergraph can handle the hypergraph neural network with millions of data points. In the following, it is introduced in detail.

Fig. 8.4
A tree diagram has 7 elements. 2 primary nodes divide into 8 secondary nodes, and multi-modal nodes containing 8 hypergraphs with 6 vertices and 3 hyperedges. Each classification undergoes hierarchy back-propagation in sub- and cross-hypergraph spaces, leading to hierarchy feed-forward.

An illustration of the hierarchical hypergraph learning

For million-scale unstructured data, it is impractical to convert the whole dataset into a single large hypergraph to represent the correlations of samples or conduct the factorization-based reduction, which would require an unrealistically large incidence matrix or a significant cost of computing memory. If there are hierarchical labels in the dataset, hierarchical hypergraph learning can be adopted to solve the problem. The original dataset \(\mathbf {X} \in \mathbb {R}^{N \times d}\) can be randomly divided uniformly into several subsets with smaller and more affordable scales, with that N denotes the scale of dataset and d denotes the dimension of sample. Then, each sample in the dataset forms vertices and hyperedges. In each subset, we construct a sub-hypergraph using the K nearest neighbors algorithm (kNN), which is based on the Euclidean distance between the representations of each pair of vertices. The incidence matrix \({\mathbf {H}}_i \in \mathbb {R}^{\left |\mathbb {V}_i\right | \times \left |\mathbb {E}_i\right |}\) serves as the role of indicating the correlation among vertices and the hyperedges, of values consisting of 0 and 1.

Given the initial feature matrix of vertices X as well as the corresponding incidence matrix H, we use \(\mathbb {G}_i = \langle \mathbb {V}_i,\mathbb {E}_i \rangle , (i=1,2,3,\ldots ,m)\) to represent the i-th hypergraph that contains \(\left | \mathbb {V}_i \right |\) vertices and \(\left | \mathbb {E}_i \right |\) hyperedges. In order to weaken the loss of feature over-smooth in the convolutional operations, the residual connection [4] can be adopted to generate the updated vertex representations for the next layer of convolution, formulated as follows:

$$\displaystyle \begin{aligned} \widehat{\mathbf{X}}_i=\sigma(D^{-1/2}_iH_iW_i\mathbb{D}^{-1}_iH^\top_iD^{-1/2}_i{\mathbf{X}}_i\varTheta_i+{\mathbf{X}}_i), {} \end{aligned} $$
(8.8)

where \(D_i \in \mathbb {R}^{\left |\mathbb {V}_i\right |\times \left |\mathbb {V}_i\right |}\) and \(\mathbb {D}_i \in \mathbb {R}^{\left |\mathbb {E}_i\right |\times \left |\mathbb {E}_i\right |} \) are degree matrices of vertex and hyperedge. \(W_i = diag(w_1,w_2,\ldots ,w_{\left |\mathbb {E}_i\right |})\) and \(\varTheta _i \in \mathbb {R}^{d\times d}\) indicate the trainable weight parameters of the hyperedges and trainable weight matrix for feature transformation.

Note that here we assume that each sample has two hierarchical labels, named secondary label and primary label, and in which secondary label is the fine-grained category of the primary label. One special component in this first step is the “vertex belonging matrix,” denoted as \(\varGamma _{i} \in \mathbb {R}^{\left | \mathbb {V}_i \right |\times \mathbb {N}_2}\), where \(\mathbb {N}_2\) is the number of secondary labels. The matrix Γ i is generated by the labels in the training set and serves as the input for the transductive learning method.

The global labels shared by all the subsets are usually in the magnitude of hundreds, making it feasible to combine the independently learned label features of different groups. Obtaining the local latent high-order representations of subsets in the previous hypergraph learning step, two aggregating operations can be conducted here for primary and secondary labels classification, respectively. The aggregation of local secondary labels can be formulated as follows:

$$\displaystyle \begin{aligned} {\mathbf{S}}_i = \varGamma^{\top }_{i}\widehat{\mathbf{X}}_i, \end{aligned} $$
(8.9)

where X i denotes the aggregated local representation for secondary label, whose dimension is \(\mathbb {R}^{\mathbb {N}_2 \times d}\). Each row of the matrix S i represents the latent feature for each specific category of secondary label in the i-th subset.

We then concatenate all of the local high-order vertices’ features \(\widehat {\mathbf {X}}_i\) to generate the global high-order vertices’ features \(\widehat {\mathbf {X}} \in \mathbb {R}^{\left |\mathbf {V} \right | \times d}\) as follows:

$$\displaystyle \begin{aligned} \widehat{\mathbf{X}} = \left[ \widehat{\mathbf{X}}_1^\top \| \widehat{\mathbf{X}}_2^\top \| \cdots \| \widehat{\mathbf{X}}_m^\top \right]^\top, \end{aligned} $$
(8.10)

where ⋅∥⋅ denotes the concatenating operation between two matrices. The local aggregated secondary features \({\mathbf {S}}_i \in \mathbb {R}^{\mathbb {N}_2\times d}\) can be further fused to form the global secondary features \(\mathbf {S}\in \mathbb {R}^{\mathbb {N}_2\times d}\) by average pooling, formulated as follows:

$$\displaystyle \begin{aligned} \mathbf{S} = {\mathbf{S}}_1 \oplus {\mathbf{S}}_2 \oplus \cdots \oplus {\mathbf{S}}_m, \end{aligned} $$
(8.11)

where ⊕ denotes the average pooling operation, calculating the mean value of the corresponding latent features from the local secondary labels.

The global high-order representation of primary labels (\(\mathbf {P} \in \mathbb {R}^{\mathbb {N}_1 \times d}\)) is yielded from the global features of secondary labels, formulated as

$$\displaystyle \begin{aligned} \mathbf{P} = \varPhi \mathbf{S}, \end{aligned} $$
(8.12)

where \({\varPhi } \in \mathbb {R}^{\mathbb {N}_1 \times \mathbb {N}_2}\) denotes the owning relations between secondary and primary labels.

Based on the results of the hypergraph convolution and global aggregation, the classifier consisting of the fully connected layers can be trained by concatenating the updated vertices’ high-order representations and the global classification. The augmented representations of vertices are shown below:

$$\displaystyle \begin{aligned} \left\{\begin{array}{ll} \widetilde{\mathbf{X}}^{<1>}_i= \widehat{\mathbf{X}}_i\parallel\frac{1}{\mathbb{N}_1}\sum_{j=1}^{\mathbb{N}_1}{\mathbf{P}}_j \\ \widetilde{\mathbf{X}}^{<2>}_i= \widehat{\mathbf{X}}_i\parallel\frac{1}{\mathbb{N}_2}\sum_{j=1}^{\mathbb{N}_2}{\mathbf{S}}_j \end{array}\right.. \end{aligned} $$
(8.13)

Then the aggregated features can be used for some tasks and trained with the hierarchical labels in training set. In the following, we introduce the hierarchical hypergraph learning in recommendation.

Here, we introduce an application of hierarchical hypergraph learning for large scale user retrieval intention detection. Figure 8.5 shows the layout, which mainly consists of three steps: data division and local hypergraph modeling, latent high-order feature aggregation, and user intention prediction, respectively.

Fig. 8.5
A process flow diagram. The aggregation of 3 hypergraph convolutions with elements X, H, and T for each generates high-order vertices and global secondary features pooling. They cumulatively generate the primary and secondary user intention predictions.

An illustration of heterogeneous hypergraph neural network for extracting the high-order user behavior representations from the page-view level data

First, we randomly divide the original dataset uniformly into several subsets. In our work, every query log and the relationships between numerous query logs form vertices and hyperedges. As shown in Fig. 8.5, the whole original dataset and the divided subsets are, respectively, denoted as V and \(\mathbb {V}_i\), where \(i\in \left [1, 2, \ldots , m \right ]\). And in each subset, a sub-hypergraph can be constructed, which is introduced above. Note that the initial semantic embeddings of vertices (\({\mathbf {X}}_i \in \mathbb {R}^{\left | \mathbb {V}_i \right | \times d}\)) are extracted by the well-known pre-trained models, such as BERT [18], where d denotes the dimension of embeddings.

The hierarchical hypergraph learning can then be used to conduct the user intention prediction. In our research, the user intentions are categorized into two levels, i.e., the primary label and the secondary label, which is the fine-grained category of the primary label. After applying the hierarchical model, the features \(\widetilde {\mathbf {X}}^{<1>}_i\) and \(\widetilde {\mathbf {X}}^{<2>}_i\) can be obtained.

In this application, the multi-classification can be converted into multiple binary classification problems to improve the effect of the model. We use \(\mathbb {C} = \left \{ \mathbb {C}_1, \mathbb {C}_2,\ldots ,\mathbb {C}_{\mathbb {N}}\right \}, \mathbb {N} \in (\mathbb {N}_1,\mathbb {N}_2)\) to denote the collection of the user intentions. Therefore, the original multiple labels are converted into two labels: 0 and 1. For instance, we traverse all the data with l intentions to label 1, and others to label 0. Each classifier is trained using multi-layer perceptron (MLP) and the sigmoid activation function to implement label prediction based on the newly allocated binary label, formulated as follows:

$$\displaystyle \begin{aligned} \left\{\begin{array}{ll} \widehat{\mathbb{Y}}_1 = \sigma(\widetilde{\mathbf{X}}^{<1>}_i\varTheta_{f1}+b_1) \\ \widehat{\mathbb{Y}}_2 = \sigma(\widetilde{\mathbf{X}}^{<2>}_i\varTheta_{f2}+b_2),\end{array}\right. \end{aligned} $$
(8.14)

where Θ f1 and Θ f2 are the trainable transformation matrices. b 1 and b 2 are the biases. σ is the activation function. \(\widehat {\mathbb {Y}}_1\) and \(\widehat {\mathbb {Y}}_2\) denote the prediction of the primary and secondary user intentions, respectively.

To supervise and optimize the trainable parameters, we apply the cross-entropy loss function in the training procedure:

$$\displaystyle \begin{aligned} \mathbb{L} = \mathbb{C}\mathbb{E}(\mathbb{Y}_1,\hat{\mathbb{Y}_1})+\mathbb{C}\mathbb{E}(\mathbb{Y}_2,\hat{\mathbb{Y}_2}), \end{aligned} $$
(8.15)

where \(\mathbb {Y}_1\) and \(\mathbb {Y}_2\) denote the ground truth of the primary and secondary user intentions, respectively. When all of the classifiers have been trained completely, each test sample can be predicated to obtain a list of scores for both primary and secondary user intentions.

To summarize, the hierarchical hypergraph learning method can handle large scale hypergraphs with hierarchical labels, which divides a dataset into multiple sub-hypergraphs, and hierarchical aggregation is performed based on hierarchical labels. The hierarchical hypergraph can integrate with the hypergraph neural network to handle millions of data points.

8.4 Summary

This chapter describes two kinds of large scale hypergraph computation methods, i.e., factorization-based hypergraph reduction and hierarchical hypergraph learning. The factorization-based hypergraph reduction is based on the strategy of factorization, which decomposes the large scale hypergraph into low-dimensional embeddings of vertices and hyperedges. It can support the processing of hypergraphs with nearly 10,000 vertices or hyperedges. The hierarchical hypergraph learning is used to analyze hypergraphs with hierarchical labels, which divides a dataset into multiple sub-hypergraphs, and hierarchical aggregation is performed based on hierarchical labels. This method can support millions of data points. We also introduce two applications as examples, i.e., whole-slide image analysis and recommendation, to illustrate the usage of these two algorithms in practice. There are some other large scale hypergraph application scenarios, such as community discovery [19], spectral clustering [20], etc.