10.1 Introduction

In the past few decades, massive biological and medical data were generated owing to the rapid development of big data techniques. These data can be used for tasks of disease gene analysis, disease risk assessment, targeted drug discovery, etc. The data further contribute to disease prevention and early diagnosis and treatment of diseases. The biological and medical data are complex, heterogeneous, and multi-modal, with widespread inter- and intra-data correlations. For example, in early disease diagnosis, patients with similar medical image appearance may also share similar disease conditions; different modalities of the medical image of the same patient, such as MRI and CT, may also exhibit disease characteristics from different perspectives; the patches within gigapixel histopathological images may have implicit collaborative associations that reveal patients’ potential health risks. Therefore, how to model such correlation behind these data is very important for medical and biological applications.

Hypergraphs, which own the flexible hyperedges, provide a possible solution for modeling such complex correlations within medical and biological data. Given the observed data, the hypergraph structure can be generated using the previously mentioned methods and naturally incorporate multi-modal or heterogeneous data by concatenation of hyperedge groups and thus can discriminatively utilize the complementary information of these data. The applications of hypergraph computation in medical and biological tasks can be typically summarized as follows: (1) modeling the medical image, the patches, or the biological entities such as vertices, and connecting them with hyperedges following their feature similarity or high-order topological links; (2) exploring the high-order correlations between data using hypergraph label propagation or hypergraph neural networks so as to enhance the vertex representations; and (3) deploying these representations on the medical and biological tasks, such as medical image retrieval, disease identification, cancer tissue classification, survival prediction, and medical image segmentation.

In this chapter, we discuss five typical applications of using hypergraph computation in medical and biological applications, i.e., computer-aided diagnosis, survival prediction with histopathological images, drug discovery, and medical image segmentation. In computer-aided diagnosis, three specific applications are included, i.e., the identification [1] and medical image retrieval of MCI [2], autism spectrum disorder identification using brain functional networks [3], as well as the identification of COVID-19 by CT imaging [4]. For survival prediction with histopathological images, two techniques targeting different cases are displayed, including ranking-based survival prediction [5] and multi-hypergraph modeling for survival prediction [6]. In drug discovery, a heterogeneous hypergraph-based drug–target interaction prediction technique [7] is presented. For medical image segmentation, we introduce the hierarchical hypergraph patch labeling method. Part of the work introduced in this chapter has been published in [1,2,3,4,5,6,7,8].

10.2 Computer-Aided Diagnosis

Computer-aided diagnosis has made clinical diagnosis incredibly convenient with the advancement of artificial intelligence and owing to the widespread use of medical imaging data, including MRI, CT scan, histopathological images, and so on. Its main goal is to pursue a preliminary examination of patients for clinicians in order to increase diagnostic accuracy, avoid missed illnesses, and improve work efficiency. Many challenges still exist in the field of computer-aided diagnosis despite great machine learning and deep learning research advancements. It involves improper uses of information shared among patients and different forms of medical images, the continued existence of noisy data (such as variations in varied CT manufactures and patients’ movement during imaging), and the confusion of cases in the early stages of illness.

In traditional approaches, the relationships among patients are frequently ignored in favor of merely taking into account one patient. The illness information of patients with similar medical images assists to raise the likelihood of computer-aided diagnosis since it makes sense that if the MRI or CT features of patients are related, then their disease conditions should also be similar. Therefore, since hyperedges in hypergraphs, unlike in graphs, can connect two or more vertices, this presents a potential solution for the first challenge by allowing hypergraphs to represent high-order illness connections among multiple individuals.

Computer-aided diagnosis with medical images frequently consists of three main steps in order to be more effective. Pre-processing the image is the first step, which mostly consists of enhancing visual information, filtering out the background, and separating the region of interest from the blank to lessen interference of irrelevant areas. The next stage is to extract the region of interest’s features. Imaging features including infection lesion count, mean lesion area, lesion density, and morphological aspects must be extracted from images since it is informative and contains task-independent information. The final step is to use machine learning, deep learning, or other statistical approaches to diagnose patients and then identify various types and lesion types with the features gathered in the previous steps.

The use of hypergraph computing techniques in computer-aided diagnosis is introduced in the subsections that follow. Four specific applications are covered, namely MCI identification using MRI [1], medical image retrieval [2], COVID-19 identification using CT imaging [4], and ASD identification using brain functional networks [3]. First, we present a strategy for creating a hypergraph for each MRI sequence and modeling the best correlation of patients by information shared by several MRI sequences. It then explains how to generate multi-graph combination weights to discover the association among query subjects and the existing subject classes. This enhances the precision of medical image retrieval. In the third part, the details of the uncertainty vertex-weighted hypergraph learning approach distinguishing COVID-19 from other types of pneumonia symptoms are described. Finally, we show the application of dynamic hypergraph learning methods to diagnose the autism of children using multi-modal functional connectivity.

10.2.1 MCI Identification Using MRI

Identifying the initial phase of Alzheimer’s disease (AD) [i.e., mild cognitive impairment (MCI)] to support the diagnosis is a proper but challenging task since AD is a relatively regular dementia in seniors. Taking into consideration that research has demonstrated that combining data from various data modalities can improve the accuracy of diagnosing AD/MCI, clinically routine scans are to be used in the upcoming hypergraph computing approaches to diagnosing AD to capture multiple MR sequences of various aspects of brain structures or functions and attempt to combine them optimally.

The centralized hypergraph learning method (CHL) [1] integrates numerous imaging data in a semi-supervised manner to estimate correlations among various subjects to indicate the possibility that subjects belong to the same class. This improves the utilization of multi-modal data, of which the global illustration is shown in Fig. 10.1. In contrast to the usual graphs, hypergraphs propagate information by a group of hyperedges connecting two or more vertices concurrently. They can also capture higher-order relationships among various subjects by selecting the nearest neighbors in the feature space, i.e., whether a set of subjects in this task has common information, therefore allowing each subject to maximize the knowledge from MR sequences by optimizing concurrently the correlation and hyperedge weights among subjects. The entire process is sequentially presented in two stages, including the construction of a centralized hypergraph via processing data, and centralized hypergraph learning, to better introduce the details of using CHL in this chapter.

Fig. 10.1
A block diagram. It has multimodal imaging data, feature extraction and selection, hypergraph construction, and multimodal centralized hypergraph learning through centralized hypergraph learning and multimodal fusion producing diagnosis results.

A pipeline to classify MCI or NC from multi-modal imaging data using centralized hypergraph learning. This figure is from [1]

Different types of imaging data from patients with MCI and normal control (NC) need to be pre-processed as features before such data are used to construct the hypergraphs. Thereafter, a hypergraph \(\mathbb {G}_i=\langle \mathbb {V}_i, \mathbb {E}_i, {\mathbf {W}}_i \rangle \) is constructed for every sort of imaging data, where each subject is considered as a vertex, while the star expansion procedure is used to generate hyperedges. In particular, every vertex in each feature space is taken into account as the central vertex for generating a hyperedge, which consists of vertices located within distance \(\varphi \bar {d}\) of the center vertex, where φ is a hyperparameter and \(\bar {d}\) is the vertex’s mean distance in feature space. The hypergraph incidence matrix H i produced by the star expansion procedure is formalized as

$$\displaystyle \begin{aligned} {\mathbf{H}}_i(v, e) = \begin{cases} \exp{\Big (-\frac{d_i(v,v_c)}{0.1\bar{d}_i} \Big )} & \text{if}~v \in e \\ 0 & \text{otherwise} \end{cases} , \end{aligned} $$
(10.1)

where d i(v, v c) represents the length from the vertex v to the correlating center vertex v c, and \(\bar {d}_i\) is the vertex’s mean distance in feature space of the i-th type imaging data. It should be noted that the hyperedge weights W i start out with the same value, e.g., 1, when the hypergraph is generated.

For the MCI diagnostic work, which is regarded as a binary classification, various imaging data are employed to construct correlations among subjects using the centralized hypergraph learning method. Each step selects a hypergraph as the core hypergraph out of the four that were created from four types of data, with the others offering additional input for updating the hypergraphs. If hypergraph H j is the core, we obtain the j-th centralized hypergraph, and to understand the relationship of the vertices, the optimization formula can be written as

$$\displaystyle \begin{aligned} \arg \underset{{\mathbf{F}}_j, {\mathbf{W}}_{i}}{\min} &\Big \{\varOmega_j^c({\mathbf{F}}_j) + \lambda\mathbb{R}_{emp}({\mathbf{F}}_j) + \mu\sum_i\sum_{e\in \mathbb{E}_i}{\mathbf{W}}_i(e)^2 \Big \}\\ \mathit{s}.\mathit{t}.~&{\mathbf{H}}_i diag({\mathbf{W}}_i)=diag({\mathbf{D}}_i^v), diag({\mathbf{W}}_i)\geq 0,\\ \end{aligned}, $$
(10.2)

where \(\varOmega _j^c({\mathbf {F}}_j) \) is the regularizer to smooth out the correlations among vertices, \(\mathbb {R}_{emp}\) represents the empirical loss, \(\sum _i\sum _{e\in \mathbb {E}_i}{\mathbf {W}}_i(e)^2\) represents an l 2-norm regularizer, and \({\mathbf {D}}_i^v\) represents the degree matrix. By assigning different weights α 1, α 2 to core hypergraph and others, respectively, the regularizer term can be formulated as

$$\displaystyle \begin{aligned} \varOmega_j^c({\mathbf{F}}_j) = \alpha_1 \varOmega_j({\mathbf{F}}_j) + \sum_{i\neq j}\varOmega_j({\mathbf{F}}_i), \end{aligned} $$
(10.3)

where Ω j(F j) is equal to \({\mathbf {F}}_j^\top (\mathbf {I}-\varTheta _i){\mathbf {F}}_j\) with \(\varTheta _i={\mathbf {D}}_v^{-1/2}\mathbf {H} \mathbf {W} {\mathbf {D}}_e^{-1} {\mathbf {H}}^\top {\mathbf {D}}_v^{-1/2}\). Consequently, regularizer is rewritten as: \(\varOmega _j^c({\mathbf {F}}_j) ={\mathbf {F}}_j^\top (\varDelta _j^c){\mathbf {F}}_j \) with \(\varDelta _j^c=\mathbf {I}-(\alpha _1\varTheta _j+\alpha _2\sum _{i\neq j}\varTheta _i)\).

The optimization of Eq. (10.2) consists of two steps. In the following, we optimize the relevance matrix F j with fixed W i as

$$\displaystyle \begin{aligned} \arg \underset{{\mathbf{F}}_j}{\min} \Big \{\varOmega_j^c({\mathbf{F}}_j) + \lambda\mathbb{R}_{emp}({\mathbf{F}}_j) \Big \}, \end{aligned} $$
(10.4)

which results in the closed-form answer for \({\mathbf {F}}_j=\frac {\lambda }{1+\lambda }(\mathbf {I}-\frac {1}{1+\lambda }(\alpha _1\varTheta _j+\alpha _2\sum _{i\neq j}\varTheta _i))^{-1}\mathbf {Y}\). Following, we optimize the weight of hyperedges W i with fixed F j as

$$\displaystyle \begin{aligned} \arg \underset{{\mathbf{W}}_{i}}{\min} &\Big \{\varOmega_j^c({\mathbf{F}}_j) + \mu\sum_i\sum_{e\in \mathbb{E}_i}{\mathbf{W}}_i(e)^2 \Big \}\\ \mathit{s}.\mathit{t}.~&{\mathbf{H}}_i diag({\mathbf{W}}_i)=diag({\mathbf{D}}_i^v), diag({\mathbf{W}}_i)\geq 0.\\ \end{aligned}, $$
(10.5)

which can be optimized by quadratic programming.

To best integrate data from various MRI, we generate the weights to every centralized hypergraph by minimizing the total hypergraph Laplacian, which is expressed as

$$\displaystyle \begin{aligned} \arg \underset{\rho_i}{\min} &\Big \{\sum \rho_i \varOmega_i^c({\mathbf{F}}_i) + \eta \sum \rho_i^2 \Big \} \\ \mathit{s}.\mathit{t}.~&\sum\rho_i=1, \end{aligned}, $$
(10.6)

where ρ i represents the weight of the i-th centralized hypergraph, and η represents the trade-off parameter of the Laplacian and l 2-norm regularizer. Determined by centralized hypergraph weights, the overall relevance matrix is \(\mathbf {F}=\sum \rho _i {\mathbf {F}}_i\), of which the matching value can be used to categorize a subject.

In this subsection, we have introduced a centralized hypergraph learning method to model patient relationships for MCI identification. For each type of data, hypergraphs are constructed in the framework. In hypergraph learning, one hypergraph is chosen as the core hypergraph each time, and the remaining hypergraphs help the core hypergraph optimize the relevance matrix for prediction. The method not only takes into account the link among subjects, but it also makes use of a range of different types of data to increase the identification impact.

10.2.2 Medical Image Retrieval

Medical image retrieval is another crucial application of computer-aided diagnosis in Alzheimer’s disease, along with the classification of patients with MCI or natural control introduced above. Its main goal is to offer clinicians with relevant MCI examples of visually comparable imaging data. Such data can also be provided to doctors in medical practice for instance thinking or scientific proof medicine.

Two primary stages help compensate for the MCI diagnosis-aided medical image retrieval technique [2], i.e., query about the class prediction for choosing candidates and ranking. The first stage involves finding the database’s most relevant subjects based on the query subject. Such knowledge is then used to predict, under supervision, the query subject’s category, i.e., the MCI patients or NC in this case. The graphs based on the pairwise object distance from various data modalities are combined into a multi-graph to predict the category of the query, after that every subject falling under the same category as the query is regarded as a potential subject. Second, the query subject and all of the candidate subjects are represented together in a new multi-graph. The learning process on the multi-graph reveals how related each candidate is to the query subject, allowing for ranking depending on the quality of similarity. The details of the two stages are shown in Fig. 10.2 [2] and explained below.

Fig. 10.2
A flow diagram. It has query category prediction, candidate search results, learning-based ranking, and search results. A query subject points to query category prediction and learning-based ranking. Selected training data points to query category prediction and candidate search results.

The pipeline for medical image retrieval method. This figure is from [2]

The query category is initially expected to use the subjects in the database given the query imaging data so that candidates can eventually be chosen based on the result. To analyze the similarity between the query subject and the training subjects chosen from the database, a graph \(\mathbb {G}_i=\langle \mathbb {V}_i, \mathbb {E}_i, {\mathbf {W}}_i \rangle \) with N + 1 vertices is generated for the imaging data of the i-th modality out of N mod modalities. The weight W i(v s, v t) of edge \(\mathbb {E}_i(v_s,v_t)\), which connects the s-th and t-th vertices of the graph \(\mathbb {G}_i\), is given by

$$\displaystyle \begin{aligned} {\mathbf{W}}_i(v_s,v_t) = \exp\Big( \frac{d^2(v_s,v_t)}{\sigma_i^2}\Big ), \end{aligned} $$
(10.7)

where d(v s, v t) represents the Euclidean distance between vertices v s and v t in the feature space. Similar to the processing of identifying MCI, the optimization equation for the multi-graph learning task for query category prediction can be written as

$$\displaystyle \begin{aligned} \arg \underset{\mathbf{F},\boldsymbol{\omega}}{\min} &\Big \{\sum_{i=1}^{N_{mod}} \omega_i \varOmega_i(\mathbf{F}) + \mu\mathbb{R}(\mathbf{F})+\eta \|\boldsymbol{\omega} \|{}_2^2 \Big \},\\ \mathit{s}.\mathit{t}.~&\sum_{i=1}^{N_{mod}}\omega_i=1, \end{aligned} $$
(10.8)

where ω and F represent the weighting parameters and the relevance matrix, respectively, μ, η represent the trade-off hyperparameters, \(\mathbb {R}\) represents the empirical loss, and Ω i represents the regularizer term defined as

$$\displaystyle \begin{aligned} \varOmega_i = \frac{1}{2}\sum_{v_s,v_t}{\mathbf{W}}_i(v_s,v_t)\| \frac{\mathbf{F}(v_s,\cdot)}{\sqrt{{\mathbf{D}}_i(v_s,v_s)}}-\frac{\mathbf{F}(v_t,\cdot)}{\sqrt{{\mathbf{D}}_i(v_t,v_t)}}\|{}^2. \end{aligned} $$
(10.9)

To solve the aforementioned optimization equation, F and ω can be optimized alternatively. When ω is fixed, the optimization equation for F is written as

$$\displaystyle \begin{aligned} \arg \underset{\mathbf{F}}{\min} \left\{\sum_{i=1}^{N_{mod}} \omega_i \varOmega_i(\mathbf{F}) + \mu\mathbb{R}(\mathbf{F}) \right\}, \end{aligned} $$
(10.10)

which can be solved using the iterative process [9] formulated as

$$\displaystyle \begin{aligned} \mathbf{F}(t+1) = \frac{1}{\mu + 1} \sum_{i=1}^{N_{mod}}\omega_i\varTheta_i\mathbf{F}(t)+\frac{\mu}{\mu + 1}\mathbf{Y}, \end{aligned} $$
(10.11)

where F(t) is the t-th step of the iteration started out with F(0) = Y. When F is fixed, the optimization equation for ω can be formulated as

$$\displaystyle \begin{aligned} \arg \underset{\boldsymbol{\omega}}{\min} & \left\{\sum_{i=1}^{N_{mod}} \omega_i \varOmega_i(\mathbf{F}) + \eta \|\boldsymbol{\omega} \|{}_2^2 \right\}, \\ \mathit{s}.\mathit{t}.~&\sum_{i=1}^{N_{mod}}\omega_i=1, \end{aligned} $$
(10.12)

which can be worked on by applying the Lagrangian method. All database subjects belonging to the same category are employed as candidate retrieval results based on the learned category of query subject.

Candidates are ranked for the retrieval of the most relevant subjects. Even though they are related to the same category of query subject, they may still differ from each other from the viewpoint of imaging appearance. Candidate subjects and query subjects construct graphs using each of N mod modalities, where the i-th graph can be referred to \(\hat {\mathbb {G}}_i \), in a manner similar to the previous classification step. Since the graph’s weight ω has been learned, the optimization equation can be written as

$$\displaystyle \begin{aligned} \arg \underset{\hat{\mathbf{f}}}{\min} \Bigg \{\sum_{i=1}^{N_{mod}} \omega_i \hat{\varOmega}_i(\hat{\mathbf{f}}) + \hat{\lambda}\hat{\mathbb{R}}(\hat{\mathbf{f}}) \Bigg \}, \end{aligned} $$
(10.13)

where \(\hat {\mathbf {f}}\) and \(\hat {\varOmega }\) represent the relevant vector and graph regularizer, respectively. \(\hat {\mathbb {R}}\) is the empirical loss. The optimization task, such as Eq. (10.10), is handled using an iterative procedure, represented by

$$\displaystyle \begin{aligned} \hat{\mathbf{f}}(t+1)=\frac{1}{\hat{\lambda} + 1} \sum_{i=1}^{N_{mod}}\omega_i\hat{\varTheta}_i\hat{\mathbf{f}}(t)+\frac{\hat{\lambda}}{\hat{\lambda} + 1}\hat{\mathbf{y}}. \end{aligned} $$
(10.14)

The ranking of all candidates can be established by sorting based on the correlation given by \(\hat {\mathbf {f}}\).

This subsection introduces the process of retrieving data relevant to the query subject from medical imaging datasets to support the diagnosis of MCI. The first stage selects the candidate set from the database, and the second stage computes the correlation between the query subject and all of the subjects in the candidate set and then ranks the retrieval based on the correlation. Both stages employ multi-graphs to describe the relationship between subjects, so as to facilitate retrieval tasks.

10.2.3 COVID-19 Identification Using CT Imaging

The COVID-19 pandemic, which has become the most widespread public health crisis since late 2019, is brought on by an extremely infectious virus and can induce multiple organ failures and server respiratory distress. Therefore, it is crucial to correctly distinguish COVID-19 from other forms of pneumonia to help correctly design pneumonia treatment programs. Nevertheless, the task is complex, as there are two main difficulties, namely noisy data resulting from the highly varied data gathered during crises, and confusing cases resulting from the similarity between COVID-19 and other types of pneumonia cases of the initial phases of symptoms.

Numerous investigations have demonstrated the usefulness of differentiating between COVID-19 and other types of pneumonia using CT, leading to the introduction of an uncertainty vertex-weighted hypergraph learning strategy to identify COVID-19 from other types of pneumonia using CT images [4]. It formulates data correlations among various instances to limit the interference by noisy data and confusing examples by employing an uncertainty rating quantification module and a vertex-weighted hypergraph structure. The framework introduction that follows is divided into three parts, namely pre-processing, measuring data uncertainty, and hypergraph construction and learning. Figure 10.3 depicts the overall illustration.

Fig. 10.3
An illustration. The hypergraph learning method for identifying COVID-19 starts with scans. It undergoes data uncertainty measurement through aleatoric and epistemic uncertainty. The uncertainty-vertex hypergraph modeling and learning then help differentiate between COVID-19 and CAP.

An illustration of the uncertainty vertex-weighted hypergraph learning method for identifying COVID-19 among other types of pneumonia. This figure is from [4]

Regional features and radiomics features should be collected from the CT for every patient segregated using VB-Net [10] during the pre-processing stage. Regional features include the number of infected lesions and the surface area of the lesions, whereas textural features including the gray-level co-occurrence matrix are examples of radiomics features. The feature representation X of a patient’s CT image is constructed by combining the two categories of features with information on age and gender.

Data uncertainty measurements are crucial in determining the dependability of various data throughout the learning process since noise can have an impact on data quality. The two types of uncertainty measurements are aleatoric and epistemic. The former one results from data abnormalities, noise, or other issues that lower the data quality, and the latter one is produced by the case’s features being at the decision boundary. The goal of parameter estimation under aleatoric uncertainty is to minimize the KL divergence for both the actual and forecasted distributions, which can be represented by

$$\displaystyle \begin{aligned} \hat{\varTheta} = \underset{\varTheta}{\arg \min} \frac{1}{N} D_{KL}(P_D({\mathbf{X}}_i)||P_\varTheta({\mathbf{X}}_i)), \end{aligned} $$
(10.15)

where P D(X i), P Θ(X i) represent the real distribution and the predicted distribution, respectively. By way of optimization, the loss function is expressed as

$$\displaystyle \begin{aligned} \mathbb{L}(\varTheta)=\frac{1}{N}\sum_i^N\left(\frac{1}{2}\exp(-\alpha_\varTheta({\mathbf{X}}_i))\mathbb{C}\mathbb{E}\left({\mathbf{y}}_i,f_\varTheta({\mathbf{X}}_i)\right)+\frac{1}{2}\alpha_\varTheta({\mathbf{X}}_i)\right), \end{aligned} $$
(10.16)

where α Θ(X i) represents the log value of the estimated variance, and the aleatoric uncertainty defines as \(A_\varTheta ({\mathbf {X}}_i)=\exp (\alpha _\varTheta ({\mathbf {X}}_i))\). Dropout can be used for inference to determine the epistemic uncertainty, which can be expressed as the model’s inability to generate accurate predictions and is written as

$$\displaystyle \begin{aligned} \mathbb{E}(f_{\hat{\varTheta}}({\mathbf{X}}_i)) \approx \frac{1}{K} \sum_{k=1}^K f_{\hat{\varTheta}(\omega^k)}({\mathbf{X}}_i)^\top f_{\hat{\varTheta}(\omega^k)}({\mathbf{X}}_i)-\mathbf{E}(f_{\hat{\varTheta}(\omega^k)}({\mathbf{X}}_i))^\top \mathbf{E}(f_{\hat{\varTheta}(\omega^k)}({\mathbf{X}}_i)), \end{aligned} $$
(10.17)

where ω represents the set of random variables and k represents the k-th test with dropout. Here, the overall uncertainty is \(\mathbb {U}_{\hat {\varTheta }}({\mathbf {X}}_i) = A_{\hat {\varTheta }}({\mathbf {X}}_i) + \mathbb {E}(f_{\hat {\varTheta }}({\mathbf {X}}_i))\). With normalization, the final uncertainty can be formulated as

$$\displaystyle \begin{aligned} U_i=\sigma \Big (\lambda \frac{\mathbb{U}_{\hat{\varTheta}}({\mathbf{X}}_i)-\mu_e}{{\mathbf{s}}_e} \Big), \end{aligned} $$
(10.18)

where μ e and s e represent the mean and the standard deviation of \(\mathbb {U}\) and σ stands for the sigmoid function setting the output between 0 and 1.

Each instance is viewed as a vertex in the hypergraph that is constructed to mine high-order correlations among related patients for more precise prediction. Regional and radiomics features are used in the construction of hyperedges, respectively. In the regional features space, every vertex is regarded as a center vertex, and the nearest neighbor algorithm is used to link K nearest vertices to build a hyperedge. The similar method is applied to generate hyperedges using the radiomics feature. The uncertainty hypergraph, in contrast to the usual hypergraph, must take into account both the connection relationship and the vertex’s uncertainty score, leading to a more comprehensive explanation of the incident matrix in uncertainty vertex hypergraph \(\mathbb {G}=\langle \mathbb {V}, \mathbb {E}, \mathbf {W}, \mathbf {U} \rangle \) as

$$\displaystyle \begin{aligned} \mathbf{H}(v_j, e_i) = \begin{cases} U_j & \text{if}~v_j \in e_i \\ 0 & \text{otherwise} \end{cases} . \end{aligned} $$
(10.19)

The structure quantifies data uncertainty in comparison to conventional hypergraph learning strategies, and its optimization objective can be expressed as

$$\displaystyle \begin{aligned} \left \{ \begin{array}{ll} \mathbb{Q}_{\mathbf{U}}(\mathbf{F}) &= \arg \min_{\mathbf{F}}\{\varOmega(\mathbf{F}) + \lambda \mathbb{R}_{emp}(\mathbf{F}) \} \\ \varOmega(\mathbf{F}, \mathbb{V}, \mathbf{U}, \mathbb{E}, \mathbf{W}) &= tr({\mathbf{F}}^\top({\mathbf{U}}^\top-{\mathbf{U}}^\top \varTheta_{\mathbf{U}}\mathbf{U})\mathbf{F}) \\ \mathbb{R}_{emp}(\mathbf{F}, \mathbf{U}) &= \sum_{k=1}^K ||\mathbf{F}(:,k)-\mathbf{Y}(:,k)||{}^2 \end{array} \right. , \end{aligned} $$
(10.20)

where Ω(⋅) and \( \mathbb {R}_{emp}(\cdot )\) represent the regular function and the empirical loss, respectively, and Θ U is equal to \({\mathbf {D}}_v^{-1/2}\mathbf {H} \mathbf {W} {\mathbf {D}}_e^{-1} {\mathbf {H}}^\top {\mathbf {D}}_v^{-1/2}\). It is reasonable to rewrite the empirical loss as

$$\displaystyle \begin{aligned} \mathbb{R}_{emp}(\mathbf{F}, \mathbf{U}) = tr({\mathbf{F}}^\top{\mathbf{U}}^\top\mathbf{U}\mathbf{F}+{\mathbf{Y}}^\top{\mathbf{U}}^\top\mathbf{U}\mathbf{Y}-2{\mathbf{F}}^\top{\mathbf{U}}^\top\mathbf{U}\mathbf{Y}). \end{aligned} $$
(10.21)

The output matrix \(\mathbf {F} \in \mathbb {R}^{n \times K}\) (K representing the number of classes, i.e., K = 2 in this case) is thus represented as

$$\displaystyle \begin{aligned} \mathbf{F}=\lambda({\mathbf{U}}^\top-{\mathbf{U}}^\top \varTheta_{\mathbf{U}}\mathbf{U}+\lambda {\mathbf{U}}^\top \mathbf{U})^{-1}{\mathbf{U}}^\top \mathbf{U} \mathbf{Y}. \end{aligned} $$
(10.22)

New coming test cases can be classified as COVID-19 or other pneumonia types using the output label matrix established above.

10.2.4 ASD Identification Using Brain Functional Networks

Autism spectrum disorder (ASD) is a widespread developmental disorder that mostly affects children and has negative effects such as social communication impairments. Because of the rising cases, early identification and treatment of ASD are crucial in order to provide patients with new skills under clinical supervision. The diagnosis of ASD is mostly dependent on skilled specialists, and it is difficult to identify ASD quickly due to the shortage of experts. The correlation of various functional connectivity (FC) pattern features in ASD patients can be used for rapid diagnosis.

The ASD identification method using brain functional networks [3] is divided into three stages, namely the selection of pre-processed features, hypergraph construction, and object identification using dynamic hypergraph learning. The overall process can be referred to Fig. 10.4. Static FC (sFC) and dynamic FC (dFC) are produced using a sliding window algorithm on the original functional magnetic resonance imaging time series in the first stage, and Lasso regression is then employed to accomplish the feature selection. The hypergraph construction stage creates a hypergraph based on the comparison of image features that represent data similarity in multiple modalities. Finally, ASD is identified using a multi-modal dynamic hypergraph learning technique that detects ASD and simultaneously improves the hypergraph structure.

Fig. 10.4
A flow chart. It has input data for subjects 1 to N. Through feature selection, it enters hypergraph construction which takes data from static and dynamic modalities and produces potential function and hyperedge weight. It leads to dynamic hypergraph learning with A S D diagnosis.

A pipeline to classify ASD or healthy controls from brain functional networks data using dynamic hypergraph learning. This figure is from [3]

The feature selection stage aims to discover valuable features in dFC and sFC sequences. The i-th subject’s sFC sequence of τ time points is first separated into n sub-sequences, with the j-th sub-sequence of {j, n + j, 2n + j, …} time points. Defining \(\bar {\mathbf {z}}_i^j\) as the dynamic FC feature of the j-th sub-sequence in subject i, the Lasso regression model, as the selection operator, can be expressed as

$$\displaystyle \begin{aligned} \arg \underset{\beta_0, \beta}{\min} \Big( \frac{1}{2\tau' |\mathbb{P}|} \sum_{i\in \mathbb{P}}\sum_{j=1}^{\tau'} \left(y_i - \beta_0 - \beta^\top \bar{\mathbf{z}}_i^j\right)^2+\mu |\beta|{}_1 \Big), \end{aligned} $$
(10.23)

where τ′ = τn is the length of the sub-sequences, y i represents the label of the subject, β is the regression coefficient, and μ stands for the trade-off hyperparameter. Features with zero coefficients are discarded, and the remaining are indicated as \({\mathbf {z}}_i^j\). Defining \(\bar {\mathbf {x}}_i\) as the static FC feature of the i-th subject, the Lasso regression model is expressed as

$$\displaystyle \begin{aligned} \arg \underset{\gamma_0, \gamma}{\min} \Big( \frac{1}{2 |\mathbb{P}|} \sum_{i\in \mathbb{P}} \left(y_i - \gamma_0 - \gamma^\top \bar{\mathbf{x}}_i\right)^2+\eta |\gamma|{}_1 \Big), \end{aligned} $$
(10.24)

where y i represents the label of the subject, γ is the regression coefficient, and η stands for the trade-off hyperparameter. Features with non-zero coefficients in the sFC selection operator represented as x i are selected similarly to dFC.

The dFC sub-hypergraph \(\mathbb {G}_1=(\mathbb {V},\mathbb {E}_1)\) and the sFC sub-hypergraph \(\mathbb {G}_2=(\mathbb {V},\mathbb {E}_2)\), whose every vertex stands for a subject’s sub-sequence, are combined to construct the hypergraph \(\mathbb {G}=(\mathbb {V},\mathbb {E})\), i.e., \(\mathbb {E} = \mathbb {E}_1 \cup \mathbb {E}_2\). Since sFC features are subject level, the features of sFC sub-sequences inherit the subjects’ static modality, i.e., \({\mathbf {x}}_i^j = {\mathbf {x}}_i\). Each vertex in each sub-hypergraph is regarded as a central vertex, and the nearest neighbor algorithm is employed to connect k neighbors (k = 2n, 3n, …, k max n) to create k max hyperedges. When the two sub-hypergraphs are generated, the hypergraph is formed at the same time, and its incident matrix is expressed as

$$\displaystyle \begin{aligned} \mathbf{H}(v, e) = \begin{cases} 1 & \text{if}~v \in e \\ 0 & \text{otherwise} \end{cases} . \end{aligned} $$
(10.25)

To enhance the structure of hypergraph and to help predict ASD, the potential equation of hyperedge can be defined as

$$\displaystyle \begin{aligned} f(e) = \sum_{u,v\in \mathbb{V}}\frac{\mathbf{H}(u, e)\mathbf{H}(v, e)g(u,v)}{(a+\alpha_1+\alpha_2)\delta(e)} , \end{aligned} $$
(10.26)

where

$$\displaystyle \begin{aligned} g(u,v) &= \|\frac{\hat{y}_u}{\sqrt{d(u)}} - \frac{\hat{y}_v}{\sqrt{d(v)}} \|{}_2^2 + \alpha_1 \|\frac{{\mathbf{x}}_u}{\sqrt{d(u)}} - \frac{{\mathbf{x}}_v}{\sqrt{d(v)}} \|{}_2^2 \\ &\quad + \alpha_2 \|\frac{{\mathbf{z}}_u}{\sqrt{d(u)}} - \frac{{\mathbf{z}}_v}{\sqrt{d(v)}} \|{}_2^2 \end{aligned}. $$
(10.27)

Here δ(e) represents the degree of hyperedge e, \(\hat {y}_u, \hat {y}_v\) stand for to-be-learned labels of u, v, respectively, and α 1, α 2 are the trade-off hyperparameters. It is noted that the potential function determines the data distribution on the hyperedge jointly from sFC, dFC, and label space. The dynamic hypergraph learning cost function is formulated as

$$\displaystyle \begin{aligned} {\mathbb{L}(\hat{\mathbf{y}}, \mathbf{H})} = \sum_{e\in \mathbb{E}} \omega(e)f(e)+\theta \|\mathbf{y}-\hat{\mathbf{y}} \|{}_2^2+\lambda \| \mathbf{H}-{\mathbf{H}}_0 \|{}_2^2 , \end{aligned} $$
(10.28)

where ω(e) stands for the hyperedge’s weight, H 0 represents the initial hypergraph, and θ and λ are the trade-off hyperparameters, respectively. The objective function is shown to be divided into three terms: the first term is the loss function based on the hypergraph, and the following two terms are the empirical losses of \(\hat {\mathbf {y}}\) and H. The optimization of Eq. (10.28) consists of two stages. First, we optimize the to-be-learned labels \(\hat {\mathbf {y}}\) with the fixed H. The problem results in the closed-form solution as follows:

$$\displaystyle \begin{aligned} \hat{\mathbf{y}} = \Big (\mathbf{I}+\frac{1}{\theta(1+\alpha_1+\alpha_2)\varDelta} \Big )^{-1} \mathbf{y} , \end{aligned} $$
(10.29)

where \(\varDelta = \mathbf {I}-{\mathbf {D}}_v^{-1/2}\mathbf {H} \mathbf {W} {\mathbf {D}}_e^{-1} {\mathbf {H}}^\top {\mathbf {D}}_v^{-1/2}\). I, D v, and D e represent the identity matrix, vertex degree diagonal matrix, and hyperedge degree diagonal matrix, respectively. In the following, we optimize H with the fixed \(\hat {\mathbf {y}}\) as

$$\displaystyle \begin{aligned} {\mathbb{L}( \mathbf{H})} = \text{tr} \Big((\mathbf{I}-{\mathbf{D}}_v^{-1/2}\mathbf{H} \mathbf{W} {\mathbf{D}}_e^{-1} {\mathbf{H}}^\top {\mathbf{D}}_v^{-1/2})\mathbf{K} \Big) + \lambda \| \mathbf{H}-{\mathbf{H}}_0 \|{}_2^2 , \end{aligned} $$
(10.30)

where \(\mathbf {K}=(\hat {\mathbf {y}}\hat {\mathbf {y}}^\top +\alpha _1 \mathbf {X}{\mathbf {X}}^\top + \alpha _2 \mathbf {Z}{\mathbf {Z}}^\top ) / (1+\alpha _1+\alpha _2)\), which is optimized using the projected gradient method. Optimization can be done by the iterative procedure, formulated as

$$\displaystyle \begin{aligned} {\mathbf{H}}_{k+1} &= \mathbf{P} [{\mathbf{H}}_{k}-h_k \nabla \mathbb{L}({\mathbf{H}}_{k})] \\ \nabla \mathbb{L}(\mathbf{H}) &= 2\lambda(\mathbf{H}-{\mathbf{H}}_{0})+\mathbf{J}(\mathbf{I}\otimes {\mathbf{H}}^\top {\mathbf{D}}_v^{-1/2}\mathbf{K} {\mathbf{D}}_v^{-1/2} \mathbf{H})\mathbf{W}{\mathbf{D}}_e^{-2} \\ &\quad + {\mathbf{D}}_v^{-3/2}\mathbf{H} \mathbf{W} {\mathbf{D}}_e^{-1} {\mathbf{H}}^\top {\mathbf{D}}_v^{-1/2}\mathbf{KJW} \\ &\quad -2{\mathbf{D}}_v^{-1/2}\mathbf{K}{\mathbf{D}}_v^{-1/2}\mathbf{HW}{\mathbf{D}}_e^{-1} \end{aligned}, $$
(10.31)

where J = 11 , h k represents optimization step size of the k-th iteration, and P stands for the projection on the set {H|0 ≼H ≼ 1}. When the iterative process converges, the labels of its sub-sequences are aggregated, and the result of prediction is the category with the highest score after aggregation.

In this section, we demonstrate the use of hypergraph-based approaches in four computer-aided diagnosis applications, namely MCI identification, medical image retrieval for MCI diagnostic assistance, COVID-19 identification, and ASD identification. Hypergraphs are employed in applications to represent high-order connections among subjects when mining complicated links among patients to gather knowledge than simply their images. In the future, it could be crucial to use hypergraphs to investigate few-shot learning approaches and transfer learning strategies in the domain of medical areas, such as MCI, COVID-19, and ASD.

10.3 Survival Prediction with Histopathological Image

Survival prediction is to model survival duration, which is the period that a patient is followed up on until a certain event, e.g., cancer recurrence or death. Survival prediction based on histopathological images is to predict the survival duration or survival risk to a satisfactory degree using only the patient’s images, to estimate the severity, or to classify high and low risks, which guides the pathologist to evaluate the scenario. Since histopathological images typically include gigapixels, which are far more detailed than regular natural images, i.e., those in ImageNet [11] or MNIST [12], the main challenge of this work is how to reliably obtain the patient’s feature representation for regression prediction analysis. Moreover, the relevant information for cells and tissues may not be readily extracted as it includes complex relationships and rich morphological structural content in histopathological images.

To overcome the challenge of the large number of pixels, there exists a technique [13] that randomly chooses patches in histopathological images with a variety of cells and without blank. It extracts patch features using a pre-trained CNN network and calculates survival risk using Lasso-Cox [14] regression. To enhance the patient representation, low-level patch features produced by a pre-trained CNN-based feature extractor are optimized by a graph convolutional neural network to construct the intricate relationship between patches [15]. The power of random patch selection to cover the details of the initial histopathological image and the lack of mutual information between patches limit the representation learning capabilities of the non-graph-based method, whereas the method that uses graph modeling applies pairwise correlations modeling to make up for the loss of structural information among cells with similar roles. Nevertheless, reducing complex high-order connections into pairwise relationships inevitably results in inaccurate modeling, losing data correlations among cells and tissues that are necessary to predict one’s survival. Hence, the better solution is to model high-order data-associative representations employing hypergraph computational approaches to meet the challenges.

The following subsection explains how to use hypergraph computing in survival prediction based on histopathological images with two parts, namely ranking-based survival prediction [5] and phenotypic and topological hypergraphs-based survival prediction [6]. In the first part, a nearest-neighbor-based hypergraph modeling methodology is introduced, and optimization is achieved using a ranking-based method. In the second part, the hypergraphs are created in the image space and merged for prediction.

10.3.1 Ranking-Based Survival Prediction

This part describes the three stages required for executing the ranking-based survival prediction task via hypergraph representation [5], namely pre-processing before generating hypergraph, learning hypergraph representation, and survival ranking prediction, as illustrated in Fig. 10.5. It is worth noting that these three components are related to the framework of the graph-based survival prediction task in general, not just the rank-based survival hypergraph framework.

Fig. 10.5
An illustration has 3 sections. Preprocessing has backbone blocks and leads to hazard prediction with a regression model, that interacts with Survival rank prediction that has ranking-based survival prediction and pairwise critic.

A pipeline of ranking-based survival prediction utilizing hypergraph representation, including pre-processing, hazarding prediction via hypergraph representation, and ranking-based survival risk prediction. This figure is from [5]

In the pre-processing stage, N patches are randomly chosen from each histopathological image, and each patch has the same size as a typical natural image (e.g., 224px × 224px). Directly choosing patches at random from the original image, however, likely picks up the noisy region as well (e.g., erosion and blank). Therefore, before randomization, the OTSU algorithm [16] is applied to segregate cell tissue samples with rich information. Next, the foremost patch-level image optical structure features \({\mathbf {X}}^{(0)}\in \mathbb {R}^{N \times F}\) are extracted by a pre-trained deep neural network from ImageNet [11], where F represents the dimension of each patch feature. Image features, which are appropriate for the strata of complex tissue patterns, are included in the raw features that are retrieved from the pre-trained model and reflect the cells and tissues that are present in the patch.

Following pre-processing to extract feature information at the patch level, the hypergraph computing approach is used to produce the features representing the histopathological image level for the subsequent prediction of the survival risk score. Hypergraphs are created using the distance-based hypergraph generation method since intuitive cells and tissues with similar morphologies have comparable functionalities. Each patch is regarded as a vertex, and each vertex is considered as the center vertex to generate a hyperedge. This results in a total of N nodes and N hyperedges in the hypergraph reflecting the structural information of the histopathological image. We build hyperedges using the k nearest neighbor approach, which connects k vertices with the closest Euclidean distance between raw features from its center vertex. Therefore, the hypergraph incident matrix H is obtained. Beyond pairwise graph structures, hierarchical grouping patterns can be discovered using a hyperedge structure that creates a channel for the transfer and integration of information from the k nearest morphological patches. The information fusion among patch vertex is then accomplished using hypergraph convolutional layers, as shown below:

$$\displaystyle \begin{aligned} {\mathbf{X}}^{(l+1)} = \sigma \Big({\mathbf{D}}_v^{-1/2}\mathbf{H} \mathbf{W} {\mathbf{D}}_e^{-1} {\mathbf{H}}^\top {\mathbf{D}}_v^{-1/2}{\mathbf{X}}^{(l)} \varTheta^{(l)} \Big), \end{aligned} $$
(10.32)

where \({\mathbf {X}}^{(l)}\in \mathbb {R}^{N \times C_l}\) is the l-th layer convolution input feature with N vertices and C l dimensions, X (l+1) is the l-th layer convolution output feature, σ stands for nonlinear activation function, and the l-th layer’s learnable parameters are represented by Θ (l). The output X (L+1) of the last layer is used to forecast survival duration after L layers of convolution, where N hyperedges might reflect N patterns of causal variables. The predicted survival risk score is regressed using a fully connected neural network after X (L+1) is squeezed into \(\mathbf {X}\in \mathbb {R}^{1 \times C_{L+1}}\) via the pooling layer representing patient’s representation. The patient’s actual survival time t can be used to supervise the backpropagation process of the regression.

Ranking information, which can be used to infer the conditions of nearby patients, is also significant in regression tasks in addition to the specific survival duration of every single patient. Moreover, the ranking data accurately portray patients’ ranks for high and low risks. The prediction of survival ranking is introduced at the final, most significant, and enlightening stage. Pairs of histopathological images (i.e., pairs of patients) should be taken into consideration since models are trained on a single image currently, and the inability to distinguish the relative risks of two similar instances is the most frequent reason for inaccurate patient risk comparisons. To fine-tune the model parameters and enhance the accuracy of the model’s forecast ranking, a Bayesian-based method known as Bayesian Concordance Readjust (BCR) is presented. The BCR loss function, which is employed in pairwise training of histopathological images, embodies the Bayesian Concordance Readjust and can be formulated as follows:

$$\displaystyle \begin{aligned} \mathbb{L} = -\log \Big (\delta (\mathbb{W} \cdot ({\mathbf{X}}_i - {\mathbf{X}}_j)) \Big), \end{aligned} $$
(10.33)

where X i and X j stand for the feature representation of patients i and j, respectively, and \(\mathbb {W}\) represents the learnable parameters of regression.

In this subsection, we provide a ranking-based survival prediction method for predicting a patient’s survival hazard score from a single WSI image. The method first extracts informative patches from WSI images and then applies a hypergraph to describe the correlations among patches to create overall features of WSI. Finally, the method considers relative ranking information among various patients and achieves greater prediction results.

10.3.2 Phenotypic and Topological Hypergraph Modeling

The hypergraph for mining high-order correlations in the data is essential for accurately generating feature representation of histopathological images. We can notice that the previously presented ranking-based survival prediction method only employs the nearest neighbor generation method when constructing a hypergraph. This method only fine-tunes image features among patches with similar features and mines high-order relationships from one single perspective, which tends to leave other informative high-order relationships out. Therefore, here we describe a multi-hypergraph-based learning method for survival prediction [6], which efficiently achieves a high-order global representation of the histopathological image by using a variety of edges correlation modeling in several spaces and a basic hypergraph convolutional network.

The goal of multi-hypergraph modeling is to uncover topological linkages among patches in image space and high-order connections among patches in latent feature space. The random sampling approach previously employed cannot be used since it is essential to analyze the topological connections of the image space; instead, the sampling is carried out according to the position of the patch in the original image. Therefore, the sampling process uses a boundary-to-center strategy (shown in Fig. 10.6) after the OSTU algorithm [16] filters noisy regions to produce informative regions of interest. In addition to selecting the border \(\mathbb {B}^1\) and the center \(\mathbb {C}\) of regions of interest, patches are chosen based on various distance radios of \(\frac {3}{4}\), \(\frac {1}{2}\), and \(\frac {1}{4}\), i.e., \(\mathbb {B}^{\frac {3}{4}}\), \(\mathbb {B}^{\frac {1}{2}}\), and \(\mathbb {B}^{\frac {1}{4}}\) in Fig. 10.6 from boundary to the center. Patches with the same percentage of the distance from the border in the same region of interest and centers among regions can be taken up as correlating in the image space.

Fig. 10.6
A flow diagram. It has input W S I, filter and tile grids, random and topological sampling, visual feature extractor, and low-level feature. There are 5 different topological samplings with boundaries 1, 3 by 4, 1 by 2, 1 by 4, and a center.

Patch sampling and low-level feature extraction. This figure is from [6]

A multi-hypergraph \(\mathbb {G}=(\mathbb {V},\mathbb {E})\) is constructed by joining two sub-hypergraphs, namely a phenotypic sub-hypergraph \(\mathbb {G}_{phe}=(\mathbb {V},\mathbb {E}_{phe})\) created from the latent feature space and a topological sub-hypergraph \(\mathbb {G}_{top}=(\mathbb {V},\mathbb {E}_{top})\) generated from image space, i.e., \(\mathbb {E} = \mathbb {E}_{phe} ~\cup ~\mathbb {E}_{top}\), as shown in Fig. 10.7. Based on the Euclidean distances between extracted patch visual features, as explained in the previous method, the incident matrix of the phenotypic sub-hypergraph H phe is built using the k nearest neighbor method. In the incident matrix of the topological sub-hypergraph H top, each vertex is linked to its neighbors in the topological space, i.e., the centers of all regions of interest, \(\mathbb {B}^{\frac {1}{4}}\), \(\mathbb {B}^{\frac {1}{2}}\), \(\mathbb {B}^{\frac {3}{4}}\), and the boundaries of each region of interest.

Fig. 10.7
A flow diagram. It has a cuboid with hypergraphs for A path and a circular hypergraph for A topo, from the left to the right. It leads to H topo and H path, respectively. Through Concatenation function, both merge and result in H with a matrix and a hypergraph.

Construction of multi-hypergraph, which contains a phenotypic sub-hypergraph and a topological sub-hypergraph. This figure is from [6]

The standard hypergraph neural network is modified to the hypergraph max-mask convolution with an increased number of hyperedges, which can address the overfitting issue brought up by a lack of training data. Each layer’s convolutional process consists of four steps, namely hyperedge feature gathering, max-mask operation, vertex feature aggregating, and vertex feature re-weighting.

The features of each hyperedge \(\mathbb {F}_e^{(l)}\)are gathered during the first step from the vertices that are directly linked to it, which can be written as a product of H and X (l). The hyperedge features \(\mathbb {F}_e^{(l+1)}\) of the convolutional layer are then produced by performing a max-mask operation on the features excluding λ dominating hyperedges. In the final two steps, the output vertex features \(\widetilde {\mathbb {F}}_v^{(l+1)}\) are obtained by aggregating the hyperedge features by multiplying matrix H and re-weighting them using a learnable parameter Θ (l), respectively. Therefore, the whole steps of each layer of the hypergraph neural network in the framework are formulated as

$$\displaystyle \begin{aligned} \left \{ \begin{array}{ll} {\mathbf{X}}^{(l+1)} &= \sigma \Big[ ((\mathbf{I}-\mathbf{L}) {\mathbf{X}}^{(l)} + {\mathbf{H}}^{-1} (\mathbf{I}-\mathbf{L}) {\mathbf{X}}^{(\lambda)})\varTheta^{(l)} \Big] \\ \mathbb{F}_e^{(l+1)} &= {\mathbf{H}}^{-1} (\mathbf{I}-\mathbf{L}) {\mathbf{X}}^{(l)} + {\mathbf{X}}^{(\lambda)} \end{array} \right. , \end{aligned} $$
(10.34)

where X (λ) stands for an offset matrix containing only the data from the dominant λ hyperedges, and H −1(I −L)X (λ) ensures the computing gradients and adjusting vertex features have no impact on the top λ hyperedges.

With two learnable weight vectors, the vertex feature matrix X (L+1) and the hyperedge matrix \(\mathbb {F}_e^{(L+1)}\) of the final layer are squeezed into feature vectors. The feature fusion module then merges the two vectors to establish a global feature representation that represents the entire hypergraph, i.e., the histopathological image for the regression task.

In this subsection, we introduce a general framework and a ranking-based optimization method for the task of survival prediction using histopathological images. The survival prediction challenges are then addressed by replacing a single nearest neighbor modeling algorithm with the multiple hypergraphs modeling method. The transformer network is a commonly used model of long-term sequential data, while histopathological images also include a significant quantity of sequential topological histopathological information, making it conceivable to incorporate transformer to the survival prediction task. Therefore, in future works, we can attempt to include transformer into the framework’s feature extraction or the construction of hypergraphs component.

10.4 Drug Discovery

Predicting drug–target interactions (DTIs) is a critical step in the process of discovering new drugs to treat diseases. Nevertheless, the commonly used biochemical experimental methods in wet laboratories are always costly and tedious. The development of drug discovery computational methods, of which machine learning based methods are one of the most promising, has been prompted by the growing need for low-cost, effective, and efficient DTI prediction methods. The core idea of these methods is that similar targets may be linked with similar drugs, and for the drug the assumption is symmetric. This assumption defacto implies the potential high-order associations between drugs and targets, especially when considering the complex heterogeneous biological networks that contain different biological entities such as proteins.

In the DTI network, one single drug may interact with a group of targets, which can be generalized as a “one-to-many” pattern. When it comes to the aforementioned heterogeneous biological networks, the interactions between these biological entities become more complex, emerging as the “many-to-many” pattern. The hypergraph structure, which can naturally model high-order correlations owing to its flexible hyperedge, is suitable for modeling such a complex heterogeneous biological network. It can conveniently incorporate multiple complex interactions between different biological entities and further utilize the hypergraph computing technique to learn the correlations.

In this section, we present a heterogeneous hypergraph learning method for the DTI prediction (HHDTI) task [7]. The overall pipeline of the framework is illustrated in Fig. 10.8. It takes into consideration different types of interactions between biological entities (e.g., drug–target, drug–disease, and target–disease interactions) to facilitate DTI predictions.

Fig. 10.8
An illustration. The drug-target and target-drug interactions lead to the vertex and hyperedge encoders, resulting in drug and target-disease associations. These lead to low-dimensional drug and target embedding, via hypergraph convolutions, resulting in reconstruction.

An illustration of the HHDTI framework. This figure is from [7]

(1) Heterogeneous Hypergraph Modeling

The overall procedure for modeling biological networks into a heterogeneous hypergraph is illustrated in Fig. 10.9. Given a heterogeneous biological network with different kinds of biological entities and interactions among these entities, the goal of hypergraph modeling is to characterize the heterogeneous biological network into a heterogeneous hypergraph \(\mathbb {G}=(\mathbb {V}, \mathbb {E})\). Here \(\mathbb {V}=\{\mathbb {V}_1 \cup \mathbb {V}_2 \cup \ldots \cup \mathbb {V}_{o}\}\) indicates the vertex set, and \(\mathbb {E}=\{\mathbb {E}_1\cup \mathbb {E}_2\cup \ldots \cup \mathbb {E}_{r}\}\) is the hyperedge set. o and r are the number of types for entities and interactions, respectively. Specifically, we have \(\mathbb {V}_o=\{v_1, v_2, \ldots ,v_{M_o}\}\) with M o vertices and \(\mathbb {E}_r=\{e_1, e_2, \ldots ,e_{N_r}\}\) with N r hyperedges.

Fig. 10.9
A diagram has a heterogeneous biological network in the center, with low-order pairwise correlations as edges, and drug, target, and disease as vertices. Their interactions result in hypergraphs with high-order topological correlations as hyperedges.

The overall procedure for modeling biological networks into a heterogeneous hypergraph. This figure is from [7]

In the heterogeneous biological network discussed here, the set of entity types O contains drug, target, and disease. The set of interaction types R includes dr–ta, ta–dr, dr–di, and ta–di interactions.Footnote 1 Therefore, o is equal to 3 and r is equal to 4.

Moreover, multiple sub-hypergraphs with one sub-hypergraph corresponding to one type of correlation on the basis of the overall heterogeneous hypergraph can be constructed. Therefore, four sub-hypergraphs are acquired in all, i.e., four incidence matrices, which are denoted as \(\mathbf {H} \in \mathbb {R}^{M \times N_j}, j\in [1,r]\) and M is the number of two types of vertices corresponding to the correlation. Specifically, the four incidence matrices generated based on R are defined as \(\left ({\mathbf {H}}_{d r-t a}, {\mathbf {H}}_{t a-d r}, {\mathbf {H}}_{d r-d i}, {\mathbf {H}}_{t a-d i}\right )\). Figure 10.10 shows an example of a drug hypergraph.

Fig. 10.10
A set of 2 drug hypergraphs. The left hypergraph has 6 vertices representing drugs, connected by 5 hyperedges. The right hypergraph groups similar drugs within 3 hyperedges.

An example of a drug hypergraph. Each vertex on the hypergraph represents a drug, and each hyperedge connects all the drugs that share the same target

(2) Drug and Target Embedding Learning

The same framework is used to create the overall embeddings for both drugs and targets. We now briefly introduce how this framework learns drug and target embeddings.

The overall embeddings are acquired by combining the main embeddings and the assisted embeddings. Particularly, the primarily vectorized representations for all drugs and targets are provided by the main embeddings, which are learned using direct DTIs. Contrarily, the assisted embeddings offer supplementary information discovered through disease-relevant data, such as drdi and tadi connections.

We first take a drug as an example to demonstrate the learning framework. The drug’s main embeddings \(\boldsymbol {\varPhi }^k_d\) are learned from H drta using an unsupervised Bayesian deep generative model, i.e., hypergraph variational auto-encoder, while the drug assisted embeddings are generated from H drdi by leveraging the hypergraph neural networks (HGNN) [17]. For the main embeddings learning, given the DTI sub-hypergraph structure H drta, the Bayesian deep generative model serves as a vertex encoder [18] to explore the potential associations between drugs linked with one target. This method conducts a nonlinear mapping to transform the hypergraph structure H drta from the observed space into the shared space \(\boldsymbol {\varPhi }_{d r-t a}^{\prime }\) as

$$\displaystyle \begin{aligned} \boldsymbol{\varPhi}_{d r-t a}^{\prime}=f\left({\mathbf{H}}_{d r-t a} {\mathbf{W}}_{d r-t a}+{\mathbf{b}}_{d r-t a}\right), \end{aligned} $$
(10.35)

where the activation function f(⋅) is nonlinear.

The hyperbolic tangent tanh(x)(exp(x) − exp(−x)∕exp(x) + exp(−x) is used here because of its analytic form and efficiency. Learnable weight and bias are represented by \({\mathbf {W}}_{d r-t a} \in \mathbb {R}^{D_{\text{in }} \times D_{\text{out }}}\) and the \({\mathbf {b}}_{d r-t a} \in \mathbb {R}^{D_{\text{out }}}\). D in and D out are the corresponding dimensions of H drta and \(\boldsymbol {\varPhi }_{d r-t a}^{\prime }\), respectively. Following the acquisition of the \(\boldsymbol {\varPhi }_{d r-t a}^{\prime }\), two fully connected layers are used to estimate the mean and variance:

$$\displaystyle \begin{aligned} \boldsymbol{\mu}_{d r-t a}=f\left(\boldsymbol{\varPhi}_{d r-t a}^{\prime} {\mathbf{W}}_{d r-t a}^{\mu}+{\mathbf{b}}_{d r-t a}^{\mu}\right) \end{aligned} $$
(10.36)

and

$$\displaystyle \begin{aligned} \boldsymbol{\sigma}_{d r-t a}=f\left(\boldsymbol{\varPhi}_{d r-t a}^{\prime} {\mathbf{W}}_{d r-t a}^{\sigma}+{\mathbf{b}}_{d r-t a}^{\sigma}\right), \end{aligned} $$
(10.37)

where \({\mathbf {W}}_{d r-t a}^{\mu }\), \({\mathbf {W}}_{d r-t a}^{\sigma } \in \mathbb {R}^{D_{\text{out }} \times D}\) and \({\mathbf {b}}_{d r-t a}^{\mu }\), \({\mathbf {b}}_{d r-t a}^{\sigma } \in \mathbb {R}^{D}\) has been indicated before. The main embeddings \(\boldsymbol {\varPhi }^k_d\) are then sampled by

$$\displaystyle \begin{aligned} \boldsymbol{\varPhi}_{d}^{k}=\boldsymbol{\mu}_{d r-t a}+\boldsymbol{\sigma}_{d r-t a} \odot \boldsymbol{\varepsilon}, \end{aligned} $$
(10.38)

where ⊙ is the Hadamard product and ε ∼ N(0, I).

In this way, the high-order structural correlations from the direct DTIs can be captured by the major embeddings. In addition to such straightforward interactions, other types of interactions can also contribute to DTI prediction, which has been validated by recent studies [19]. For instance, phenotypic side effects can be determined by how similar they are if these two drugs share a target [20, 21]. It has been verified in the literature that reported that targets can be used as a connection between drugs and illnesses [22]. Enlightened by these discoveries, auxiliary data are integrated into HHDTI, which can provide complementary information so as to improve prediction accuracy and treat extreme cases such as the cold-start problem (only a few DTIs can be fetched).

Specifically, the drdi and tadi correlations are considered here in HHDTI, and the embeddings learned from the corresponding dr–di incidence matrices H drdi are called drug assisted embeddings, which serve as the auxiliary representation for the drug’s main embeddings. The drug assisted embeddings are learned by the HGNN model [17], with which the high-order correlations are encoded as

$$\displaystyle \begin{aligned} \operatorname{Convh}(\mathbf{H}, \mathbf{X} \mid \mathbf{W})=f\left(\left({\mathbf{D}}^{v}\right)^{-1/2} \mathbf{H}\left({\mathbf{D}}^{\boldsymbol{e}}\right)^{-1} {\mathbf{H}}^{\top}\left({\mathbf{D}}^{v}\right)^{-1/2} \mathbf{X} \mathbf{W}\right), \end{aligned} $$
(10.39)

where D v and D e are the degree matrices of vertex and hyperedge, respectively. The corresponding degree of vertex and hyperedge are \(\left ({\mathbf {D}}^{V}\right )_{k, k}=\sum _{j=1}^{L} {\mathbf {H}}^{k, j}\) and \(\left ({\mathbf {D}}^{e}\right )_{j, j}=\sum _{k=1}^{N} {\mathbf {H}}^{k, j}\), respectively. The matrix W is the learnable weight parameter, and (⋅) is the transposition operator. Specifically, the convolutional layer used to learn the drug assisted embedding \(\varPhi _{d}^{s}\) can be formulated as

$$\displaystyle \begin{aligned} \boldsymbol{\varPhi}_{d}^{s(l)}=\operatorname{Convh}\left({\mathbf{H}}_{d r-d i}, \boldsymbol{\varPhi}_{d}^{s(l-1)} \mid {\mathbf{W}}^{(I-1)}\right), \end{aligned} $$
(10.40)

where \(\varPhi _{d}^{s(l-1)}\), \(\varPhi _{d}^{s(I)}\), and W (I−1) represent the (l − 1)-th layer’s input, output, and trainable weight matrix, respectively. Here, the identity matrix is set as the initial value for X. That is, we have \( \boldsymbol {\varPhi }_{d}^{s(0)}=\mathbf {X}=\mathbf {I}\). To create the overall embeddings, an attention module is used to combine the main embeddings and assisted embeddings into a single shared space. By determining the coefficients ω i, the bi-embedding attention fusion process is specifically employed to give various weights to the main embeddings and assisted embeddings:

$$\displaystyle \begin{aligned} \omega^{i}=\frac{\exp \left(f\left(\boldsymbol{\varPhi}^{i} {\mathbf{W}}^{i}+{\mathbf{b}}^{i}\right) \cdot {\mathbf{P}}^{i}\right)}{\sum_{j \in k, s} \exp \left(f\left(\boldsymbol{\varPhi}^{i} {\mathbf{W}}^{j}+{\mathbf{b}}^{i}\right) \cdot {\mathbf{P}}^{i}\right)}, \end{aligned} $$
(10.41)

where \({\mathbf {W}}^{i} \in \mathbb {R}^{D \times D^{\prime }}\), \({\mathbf {b}}^{i} \in \mathbb {R}^{D^{\prime }} \), and \( {\mathbf {P}}^{i} \in \mathbb {R}^{D^{\prime } \times 1}\) are trainable parameters. D and D are the corresponding dimensions. The overall drug embeddings Φ S can then be obtained by

$$\displaystyle \begin{aligned} \varPhi^{S}_d=\omega^{k} \varPhi^{k}_d+\omega^{s} \varPhi^{s}_d. \end{aligned} $$
(10.42)

The overall embeddings of targets \(\varPhi ^{S}_d\) are generated similarly. The main difference lies in that here the H tadr and H tadi are used as inputs. The target main embeddings \(\varPhi ^{k}_t\) are learned using the same vertex encoder as that of drugs. The HGNN model is also adopted to yield the target assisted embeddings \(\varPhi ^{s}_t\) from the target–disease association hypergraph. Finally, the embedding attention fusion is run to achieve the overall target embeddings \(\varPhi ^{S}_t\).

(3) Drug–Target Interactions Prediction

The likelihood of the drug and the target embeddings is calculated to create the reconstruction space A, from which the DTI predictions are generated. That is, we have

$$\displaystyle \begin{aligned} \mathbf{A}=\operatorname{Sigmoid}\left(\boldsymbol{\varPhi}_{d}^{\boldsymbol{S}}\left(\boldsymbol{\varPhi}_{t}^{\boldsymbol{S}}\right)^{\top}\right), \end{aligned} $$
(10.43)

where Sigmoid(⋅) is the sigmoid function. We then give the variational lower bound \(\mathbb {L}\), which is optimized by

$$\displaystyle \begin{aligned} \mathbb{L}&=\mathbb{E}_{q}\left[\log p\left(\mathbf{A} \mid \boldsymbol{\varPhi}_{d}^{\mathbf{s}}, \boldsymbol{\varPhi}_{t}^{\mathbf{S}}\right)\right]-\beta\left(\mathrm{KL}\left(q\left(\boldsymbol{\varPhi}_{d}^{k} \mid \mathbf{A}\right) \| p\left(\boldsymbol{\varPhi}_{d}^{k}\right)\right)\right.\\&\quad \left.+\mathrm{KL}\left[q\left(\boldsymbol{\varPhi}_{t}^{k} \mid \mathbf{A}\right) \| p\left(\boldsymbol{\varPhi}_{t}^{k}\right)\right]\right), \end{aligned} $$
(10.44)

where KL[q(⋅)||p(⋅)] is the metric from distribution q(⋅) to p(⋅) in Kullback–Leibler divergence space. Varying b provides different acquired representations by changing the amount of learning pressure provided during training. Inspired by the variational auto-encoder, Gaussian priors \(p\left (\boldsymbol {\varPhi }_{d}^{k}\right )=\prod _{i} p\left (\varphi _{i}^{d}\right )=\prod _{i} \mathbb {N}\left (\varphi _{i}^{d} \mid 0, \mathbf {I}\right )\) and \(p\left (\boldsymbol {\varPhi }_{t}^{k}\right )=\prod _{j} p\left (\varphi _{j}^{t}\right )=\prod _{j} \mathbb {N}\left (\varphi _{j}^{t} \mid 0, \mathbf {I}\right )\) can be taken into consideration. Here, \(\mathbb {E}_{q}[\log p(\cdot \mid \cdot )]\) is the likelihood of reconstruction space A.

In this part, we introduce a general hypergraph-based framework for DTI predictions. It is noted that the introduced framework introduced here is neither restricted to these types of complex interactions nor the DTI prediction task here; other types of interactions that may contribute to the DTI prediction task or even other projects containing complex correlations are also thinkable.

In real-world applications, the annotations for such biomedical data are computationally expensive and time-consuming. Therefore, self-supervised learning has received a lot of attention recently since it can mine useful information from the data in an unsupervised way. Under such circumstances, it is of great significance to further devise the self-supervised hypergraph computation for DTI predictions.

10.5 Medical Image Segmentation

In the field of medical imaging, hypergraph-based image segmentation methods also play a crucial role, where there are limitations of traditional multi-atlas segmentation (MAS) methods in segmenting anatomical structures with poor image contrast. The hypergraph can be used. The hypergraph can model complex subject-within and subject-to-atlas image voxel relationships and propagate label on atlas image to target subject images.

This method is named hierarchical hypergraph patch labeling (HHPL) [8], which characterizes higher-order associations between context features by constructing a hypergraph, and transforms hypergraph learning into a hierarchical model. At the same time, a dynamic label propagation strategy is used to augment reliably identified labels from subject images to help predict labels.

As shown in Fig. 10.11, pairwise relations and complex higher-order associations in hyperedges are compared when using the MAS method, where p i is the subject image voxel, and R i(l) is defined as a 3-D cube of side length l centered on p i. Image patches are extracted using the target object image at voxel p i and the registration atlas image within the corresponding local neighborhood R n,i(l). Hyperedges can be constructed similarly between the atlas image voxels and target subject image voxels with the high-level context features from the label probability map.

Fig. 10.11
Three medical images of the brain, subject image S, and atlas images I sub 1 to n, with unclear zoomed-in images of central region. There is a 3-D cuboid with central point of p sub i, and labeled vertices, along with pair-wise and group-wise relationship in hyper edges in conventional label fusion.

Comparison of a simple pairwise relationship in the conventional MAS methods and the complex groupwise relationship in hyperedges (with much richer information). This figure is from [8]

In particular, the subject vertices under the label and the related atlas vertices with known labels affect the labels on the target topic vertex. The label propagation process follows two principles: (1) if vertices are grouped in the same hyperedge, they have the same anatomical label. (2) The label difference between vertices with known labels before and after label propagation is to be as small as possible. Therefore, the objective function of hypergraph learning is defined as follows:

$$\displaystyle \begin{aligned} \arg \min _{\mathbf{f}}\left\{\|\mathbf{y}-\mathbf{f}\|{}_{2}^{2}+\lambda \cdot \varPhi\left(\mathbf{f}, \mathbf{H}, \mathbf{W}, {\mathbf{D}}_{e}, {\mathbf{D}}_{0}\right)\right\}. \end{aligned} $$
(10.45)

The first term is the control to minimize the difference between the initialization label vector y and the prediction vector f. The second term is the graph balance term defined as

$$\displaystyle \begin{aligned} \begin{array}{l} \varPhi\left(\mathbf{f}, \mathbf{H}, \mathbf{W}, {\mathbf{D}}_{e}, {\mathbf{D}}_{v}\right) \\ \quad =\frac{1}{2} \sum_{e \in \varepsilon} \sum_{v, v^{\prime} \subseteq e} \frac{w(e) h(v, e) h\left(v^{\prime}, e\right)}{\delta(e)}\left(\frac{f(v)}{\sqrt{d(v)}}-\frac{f\left(v^{\prime}\right)}{\sqrt{d\left(v^{\prime}\right)}}\right)^{2} \end{array}. \end{aligned} $$
(10.46)

We can determine the optimal \(\hat {\mathbf {f}}\) by differentiating the objective function with respect to f:

$$\displaystyle \begin{aligned} \hat{\mathbf{f}}=(\mathbf{I}+\lambda(\mathbf{I}-\boldsymbol{\varTheta}))^{-1} \mathbf{y}. \end{aligned} $$
(10.47)

Having obtained the optimized \(\hat {\mathbf {f}}\), it is easy to obtain the anatomical labels on the subject image from the symbolic calculation target of the correlation value

$$\displaystyle \begin{aligned} \left\{\begin{array}{ll} \text{ foreground } & f_{i}>0 \\ \text{ background } & f_{i}<0 \end{array}, \quad i=1,2 \ldots|P|\right. . \end{aligned} $$
(10.48)

In other words, the segmentation can be repeatedly computed to improve the performance by: (1) hypergraph construction with high-level context features; (2) label propagation on hypergraph; and (3) the refinement of context features. The segmentation results can be found in Fig. 10.12.

Fig. 10.12
A set of 4 C T scans of the brain and their 3 D images. Each set labels the substantia nigra and red nucleus regions of the brain in the 3-D images, and also in the scan images of 4 different methods. A scale measures the range of distance on the right.

Visual comparison of automatically segmented regions by four methods on a typical subject. This figure is from [8]

10.6 Summary

In this chapter, we introduce three typical applications of hypergraph computation in medical and biological tasks. In computer-aided diagnosis, three specific applications are covered, i.e., the identification and medical image retrieval of MCI and the identification of COVID-19 by CT imaging. These examples show how to adopt hypergraph computation for the tasks of classification and retrieval in medical and biological fields. For the survival prediction with histopathological images, the demonstrated hypergraph computation techniques can also be expanded to similar regression tasks. The introduced paradigm may also be applied to other cases with complicated connections. In summary, these examples demonstrate the high-order correlation between medical and biological data, which are modeled and learned by hypergraph computation. These indeed can contribute to the corresponding study. In addition to the aforementioned examples, there are many medical and biological applications that have the potential to be explored with hypergraph computation, such as medical image enhancement and multi-modal fusion.