Introduction

MicroRNAs (miRNAs) are small regulatory non-coding RNAs produced by a variety of cells in the body and consist of approximately 22 nucleotides1. Studies have shown that miRNAs can regulate gene expression2 and influence a wide range of biological processes. In the past decades, many diseases have been found to be associated with miRNAs3,4,5,6. For example, Hollams et al.7 showed that overexpression of c-myc mRNA is associated with tumours, and IGF II mRNA binding to p62 protein is significant to the pathogenesis of hepatocellular carcinoma. Yuko Seko et al.8 elucidated that prolongation of IL-2 mRNA half-life is closely related to the development of autoimmune diseases. Later, more and more researches have found that changes in miRNA expression lead to abnormalities in gene expression and cellular function9,10,11,12, and play a non-negligible regulatory role in pathology. Therefore, in-depth exploration of miRNA expression profiles is of great significance for dissecting pathogenesis and developing new diagnostic and therapeutic options. The majority of miRNAs have been identified through gene chips13 and high-throughput sequencing technologies14, but very few have been determined to be associated with diseases. Traditional clinical trials are time-consuming and uncertain, and in order to systematically reveal the effect on miRNA expression of disease, a large number of studies have been designed in recent years to predict the relationships between the two. Depending on the type of data, existing models are divided into two main categories: prediction methods based on a single data type and prediction methods based on multi-view data. Based on the methodology, existing prediction models are mainly classified into four categories: machine learning-based models, scoring function-based models, network topology-based models, and deep learning-based models. Jiang et al.15 calculated the probability of miRNA involvement in disease using a scoring function via the miRNA functional similarity network and the human phenome-microRNAome network, and optimised the results. Wang et al.16 constructed acyclic graphs by miRNA functional similarity and disease semantic similarity, and Chen et al.17 calculated spectral kernel by miRNA and disease Gaussian similarity interaction similarity, however, such models ignore the dynamic information of the network and the heterogeneity of the data. Zeng et al.18 proposed a structural perturbation method called SPM for bilayer networks, which integrates information from multiple sources and comprehensively optimises the unknown predictive associations between miRNAs and diseases based on the network topology. Then Zhao et al.19,20,21 proposed miRNA-disease association prediction models on account of a series of machine learning methods such as XGBoost, but wrong selection of negative samples in such methods can adversely affect the prediction results22,23. It bears mentioning that the above methods are only based on a single view and the miRNA-disease associations are incomplete. So as to improve the reliability of relationship prediction by extracting information from different types of similarity views as features of miRNAs and diseases24, Yin et al.25 proposed NCPLP, which innovatively integrates disease semantic similarity and microbial functional similarity, utilizing network consistency projection and label propagation techniques to evaluate microbial similarity from different perspectives for precise prediction. Ji et al.26 adopted an end-to-end approach, combining representation learning and deep autoencoders, to predict association information. SVAEMDA27 integrates multi-source similarities and utilizes variational autoencoders to train predictors, assessing miRNA-disease associations through reconstruction probabilities, addressing the problem of semi-supervised learning. Liu et al.28 proposed GCNPCA by exploiting the superior performance of graph neural networks to capture deep topological information from heterogeneous networks. GATMDA29 exploits the mechanism of graph attention to assign different weights to miRNA neighbour nodes during aggregation to obtain higher-order neighbour information from multivariate associations. Zhou et al.30 introduced DAEMKL, an innovative approach that harnesses the power of multi-kernel learning to construct intricate miRNA similarity networks and disease similarity networks. Subsequently, features meticulously extracted from the regression models serve as the cornerstone inputs for a sophisticated deep autoencoder to identify the associations. Jin et al.31 proposed MAMFGAT, which obtains the fusion embedding of miRNAs and diseases through multimodal adaptive fusion, effectively combining the complementary information of the two modalities. Liu et al.32 proposed TWMHGT to obtain miRNA-disease embeddings by extracting information from heterogeneous graph neural networks and decoding them by matrix multiplication. Yang et al.33 used multiple heterogeneous networks applied to a graph convolutional network to obtain embedding information from different perspectives, which were fed into a random forest (MGCNRF) to predict potential associations. Recently, HGSMDA34 integrates HyperGCN, constructs a miRNA-disease heterogeneous hypergraph, trains GCN for information aggregation, and evaluates prediction similarity with Sørensen-Dice loss. Jiao et al.35 proposed MGADAE, which employs a multi-kernel learning algorithm to construct a similarity heterogeneous network, and then predicts association scores by a graph-convolution encoder, a bilinear decoder. Although the above methods improved the performance of miRNA-disease association prediction to a certain extent, different types of associations are still not fully detected and the embedding features are not completely preserved, resulting in insufficient local information being incorporated into the network models. In this study, we propose a computational method based on multi-layer heterogeneous encoder - machine learning decoder structure for miRNA-disease association prediction, called MHXGMDA. To be more specific, MHXGMDA integrates three similarity knowledge networks firstly, including miRNA-miRNA, miRNA-disease, and disease-disease networks to construct biological feature vectors from the miRNA and disease semantic similarity matrices and Gaussian similarity matrices, respectively, for multi-view encoding to construct biological feature vectors. The embedding features of miRNAs and diseases are fully extracted using the multi-layer heterogeneous encoder, and all layers are spliced to maximise the degree of information retention, which are used as inputs to the XGBoost classifier in the decoding stage to complete the association prediction task. We validated the effectiveness of MHXGMDA on two benchmark datasets using five-fold cross-validation. Experimental results show that MHXGMDA outperforms several state-of-the-art models on several independent metrics. The main contributions of this paper are summarised below:

  • Consider biological meta-pathway information. Combining a multi-layer heterogeneous encoder to capture different types of associations provides rich contextual information for encoding complex associative relationships between miRNA-disease and enhances the reliability in the prediction of unknown relationships.

  • Embedding information deep fusion. Given XGBoost’s significant advantage over most machine learning algorithms in handling embedded features, we utilise XGBoost as a decoder for miRNA-disease feature splicing matrices to further enhance the accuracy and stability of prediction.

  • Experimental Validation. We undertake comprehensive experiments across two benchmark datasets to ensure the validity of MHXGMDA and provide constructive comments through case studies with model predictions.

Materials

To validate the generalisability of our model, we downloaded two miRNA-disease datasets for benchmarking from references36,37, both of which are derived from the Human MicroRNA Disease Database (HMDD) v3.2. The first dataset (VG-data) comes from the work of VGAMF36, and after de-duplication of relationships includes 8968 relationships between 788 miRNAs and 374 diseases, and the second dataset (DA-data) comes from the work of DAmiRLocGNet37 and includes 15,547 relationships between 1041 miRNAs and 640 diseases. We found that negative association samples are much more than the positive samples in these two datasets, and there is a large amount of noisy data in the unknown associations. In order to reduce the adverse impact on the noise of the prediction results and to ensure the rationality of the selection of negative samples, as well as the balancing of the dataset, we labelled the positive samples of all the confirmed miRNA-disease associations as 1, and sampled the same number of negative samples at random as the number of positive samples in the remaining negative samples labelled as 0.

Methods

MHXGMDA framework

MiRNA-disease data is often heterogeneous, including different types of entities and complex relationships among them. In order to fully consider the associations between multiple biological entities while effectively retaining the information of the encoding-decoding process, and considering the excellent performance of HGT38 in heterogeneous data processing, we propose a computational method based on multi-layer heterogeneous encoder - machine learning decoder structure for miRNA-disease association prediction (MHXGMDA). As shown in Fig. 1, MHXGMDA mainly includes three distinct stages:

  • Multi-view similarity feature extraction. We constructed homogeneous similarity matrices for miRNAs and diseases separately as inputs.

  • Construction of multi-layer heterogeneous graph Transformer. We consider miRNAs and diseases as nodes, and traverse meta-paths in HGT to integrate multiple high-level coding information.

  • Splice matrix classification. We apply the direct splicing method to fully fuse all the output features of the multi-layer heterogeneous encoder and decode them with the XGBoost classifier to derive the ultimate prediction outcomes.

Figure 1
figure 1

The overall architecture of the MHXGMDA for predicting miRNA-disease association.

Multi-view similarity feature extraction

Based on previous methods39, we apply Gaussian kernel function to the association network of topology between bioinformatic nodes, so as to obtain miRNA semantic similarity matrix Gaussian interaction profile kernel similarity. Similarly, the disease similarity matrix is obtained by applying Gaussian kernel according to the disease semantic similarity matrix. The multi-view similarity matrices are fused to extract miRNA–miRNA and disease–disease similarity features.

$$\begin{aligned} A_{m}=mean\left\{ A_{ms}\left[ S_{ms}\right] ,A_{ms}\left[ S_{ms}\right] \right\} \end{aligned}$$
(1)
$$\begin{aligned} A_{d}=mean\left\{ A_{ds}\left[ S_{ds}\right] ,A_{ds}\left[ S_{ds}\right] \right\} \end{aligned}$$
(2)

where \(S_{ms}\) represents miRNA semantic similarity, its matrix expression \(A_{ms} \in R^{M\times M }\), \(S_{mg}\) represents miRNA Gaussian similarity, its matrix expression \(A_{mg} \in R^{M\times M }\); Similarly, \(S_{ds}\) represents disease semantic similarity, its matrix expression \(A_{ds} \in R^{D\times D }\), \(S_{dg}\) represents disease Gaussian similarity, its matrix expression \(A_{dg} \in R^{D\times D }\), the specific calculation method is detailed in the Supplementary Information. mean represents the average of the two, which is used to fuse multi-view similarity as the final miRNA similarity matrix \(A_{m}\) and disease similarity matrix \(A_{d}\).

Construction of multi-layer heterogeneous graph Transformer

Most of the previous methods failed to capture the dynamic property information of heterogeneous graphs, and the design of HGT in heterogeneous graph data processing makes it a powerful tool for dealing with complex relationships and structures, so we use HGT to learn node representations to capture potential features between miRNAs and diseases. This can be divided into three steps: Firstly, Heterogeneous Mutual Attention, the attention weights of the target node miRNA with respect to the disease of each neighbouring source node, are calculated. Specifically:

$$\begin{aligned} Attention_{HGT}\left( n_{d},e_{d,m},n_{m} \right) =\mathop {Softmax}\limits _{{\forall d\in N\left( d \right) }}\left( {\mathop {\Vert }\limits _{i\in \left[ 1,h_{th} \right] }ATT\text {-}head^{i}\left( n_{d},e_{d,m},n_{m} \right) } \right) \end{aligned}$$
(3)
$$\begin{aligned} e^{\left( l \right) }\left[ n_{d} \right] \leftarrow \mathop {Aggregate}\limits _{{\forall d\in N\left( d \right) ,\forall e\in E\left( m,d \right) }}\left( Attention\left( n_{m},n_{d}\cdot Message(n_{m} \right) \right) \end{aligned}$$
(4)
$$\begin{aligned} e^{\left( l \right) }\left[ n_{m} \right] \leftarrow \mathop {Aggregate}\limits _{{\forall d\in N\left( m \right) ,\forall e\in E\left( d,m \right) }}\left( Attention\left( n_{d},n_{m}\cdot Message(n_{d} \right) \right) \end{aligned}$$
(5)
$$\begin{aligned} \sum _{\forall d\in N\left( m \right) } Attention_{HGT}\left( n_{d},e_{d,m},n_{m}\right) =1_{th\times 1} \end{aligned}$$
(6)

where Attention is used to evaluate the significance of the source node, Message extracts information based on the source node, and Aggregate aggregates the neighbourhood information through attention weights. \(v_{d}\) denotes the coding of disease, \(v_{m}\) denotes the coding of miRNA, \(e_{d,m}\) denotes the edge from the source node (disease) to the target node (miRNA).

For the i-th attention head \(ATT\text {-}head^{i}\left( n_{d},e_{d,m},n_{m} \right)\) we project the source node d of type \(\tau \left( n_{d} \right)\) to generate the i-th key vector \(K^{i}\left( n_{d} \right)\). This linear projection process uses the \(K-Linear_{\tau \left( n_{d} \right) }^{i} :R^{dim}\rightarrow R_{h_{th}}^{dim}\) function,where \(h_{th}\) represents the number of attention heads and dim denotes the vector dimension of each head. More specifically, in an effort to cope with different meta relationships,we prepare different mapping matrices, and \(K-Linear_{\tau \left( n_{d} \right) }^{i}\) is indexed according to the type \(\tau \left( n_{d} \right)\) of the source node d, which aims to maximally preserve the unique features of various relationships and accurately reflect the relationships between different node types. Similarly, the target node m is linearly projected into \(Q-Linear_{\tau \left( n_{m} \right) }^{i}\) and the i-th query vector is generated,which is designed to help capture the associations between source and target nodes more precisely. The specific calculation formulas are as follows:

$$\begin{aligned} ATT\text {-}head^{i}\left( n_{d},e_{d,m},n_{m} \right) =\left( K^i\left( n_{m} \right) W_{\phi \left( e_{d},m \right) }^{ATT}Q^{i}\left( n_{d} \right) ^{T} \right) \cdot \frac{\mu _{\left\langle \tau (n_d),\phi (e_{d,m}),\tau (n_m) \right\rangle }}{\sqrt{dim}} \end{aligned}$$
(7)
$$\begin{aligned} K^i\left( n_m\right) = K\text {-}Linear_{\tau (n_d)}^i\left( h^{(l-1)}\left[ n_d\right] \right) \end{aligned}$$
(8)
$$\begin{aligned} Q^i\left( n_d\right) = Q\text {-}Linear_{\tau (n_m)}^i\left( h^{(l-1)}\left[ n_m\right] \right) \end{aligned}$$
(9)

Since different meta-paths contribute to the target node to different degrees,for each meta-path triad we set a prior importance weight \(\mu _{\left\langle \tau (n_d),\phi (e_{d,m}),\tau (n_m) \right\rangle }\), which serves as an adjustment factor for attention. In order to integrate enough information from different source nodes, for each target node m, we collect all the attention vectors from its neighbours \(N\left( m \right)\).

The next part is Heterogeneous Message Passing, which calculates the information contribution of each source node to the target node. The idea of multi-head information merging is adopted to splice the information of h heads to get the final representation. The specific calculation method is as follows:

$$\begin{aligned} Message_{HGT}\left( n_d,e_{d,m},n_m\right) =\mathop {\Vert}\limits _{i\in \left[ 1,e_{th}\right] }MSG\text {-}head^i(n_d,e_{d,m},n_m) \end{aligned}$$
(10)
$$\begin{aligned} MSG\text {-}head^i(n_d,e_{d,m},n_m)={\text {M-Linear}}_{\tau \left( n_d\right) }^i(e^{\left( l-1\right) }[n_d])W_{\phi (e_{d,m}) }^{MSG} \end{aligned}$$
(11)

Considering the heterogeneity of different edge types in information propagation, we invoke a mapping matrix \({M\text {-}Linear}_{\tau \left( n_d\right) }^i\) based on a specific edge type, denoting the i-th key vector obtained by linearly projecting the type \(\tau \left( n_{d} \right)\) of the source node \(n_{d}\). \(W_{\phi (e_{d,m}) }^{MSG}\) is the weight matrix associated with the edge \(e_{d,m}\). The information contributions from source nodes to target nodes are obtained from the view of multiple-heads (multi-heads) and they are combined into a final representation.

The final part is Target-Specific Aggregation. Considering that the result of each single-head attention is softmax operated, which means that the sum of the attention weights of all the source nodes is 1, it is straightforward to use the attention as the weight and perform a weighted summation of the message of all the source nodes to obtain the update vector of the target node.

$$\begin{aligned} {\widetilde{e}}^{(l)}[n_m]=\mathop {\oplus } \limits _{\forall d\in N(m)}(Attention_{HGT}(n_d,e_{d,m},n_m)\cdot Message_{HGT}(n_d,e_{d,m},n_m)) \end{aligned}$$
(12)

Similarly, to ensure the heterogeneity of the propagated information, the model incorporates corresponding linear matrices in the residual network, representing the embeddings of miRNAs and diseases.

$$\begin{aligned} e^{(l)}[n_m]=A\text {-}Linear_{ \tau \left( n_{m} \right) }(\sigma \left( {\widetilde{e}}^{\left( l\right) }\left[ n_m\right] \right) )+e^{(l-1)}[n_m] \end{aligned}$$
(13)
$$\begin{aligned} e^{(l)}[n_d]=A\text {-}Linear_{ \tau \left( n_{d} \right) }(\sigma \left( {\widetilde{e}}^{\left( l\right) }\left[ n_d\right] \right) )+e^{(l-1)}[n_d] \end{aligned}$$
(14)

Then, embedded nodes are connected according to the output of each HGT layer to fuse the information between different layers.The feature extraction effect is shown in Fig. 2.

Figure 2
figure 2

Visualisation of miRNA and disease feature heatmaps. The subplots represent vectors of learned representations of miRNA, disease, with colours indicating the intensity of the individual feature components.

Splice matrix classification

The miRNAs obtained from the multi-layer heterogeneous encoder were spliced with the disease features to obtain a fusion descriptor for miRNA-disease association, with the following details:

$$\begin{aligned} Z_{ij}=[F_m\left( i\right) ,F_d\left( j\right) ] \end{aligned}$$
(15)

where \(F_m\left( i\right)\) refers to the vector representation of the i-th miRNA within feature \(F_{m}\) and \(F_d\left( j\right)\) refers to the vector representation of the j-th disease within feature \(F_{d}\).

XGBoost, an efficient gradient boosting framework, fits the data by iteratively adding new decision trees and uses regularisation to control the complexity of the model and to better understand the data and the model through feature subset selection and feature importance assessment. In each decision tree generation, XGBoost uses a gradient-based splitting criterion to select the best division point to improve the accuracy and generalisation of the model. XGBoost is also widely used in bioinformatics research for classification and regression tasks such as gene expression profiling40, disease prediction41, and drug response prediction42. In this study, we utilise XGBoost as a classifier for fusion descriptors of miRNA and disease embedding features. From one perspective, the loss of encoding information is minimally avoided to improve the accuracy and reliability of the association prediction. From another perspective, XGBoost can effectively handle large-scale miRNA and disease data, optimise model performance and minimise the likelihood of overfitting through the gradient boosting algorithm and regularisation. Consequently, as shown in Fig. 3, our model demonstrates superior generalization capabilities on novel data, thereby enhancing its robustness.

Figure 3
figure 3

Visualization of predicted score matrix and label matrix heatmaps. The subgraphs respectively depict known and anticipated relationships between miRNAs and diseases. In these heatmaps, the rows represent miRNAs while the columns correspond to various diseases.

Experiments and results

To access the performance of MHXGMDA in the aspect of miRNA-disease association prediction, we conducted a comparative analysis with seven state-of-the-art baselines on two benchmark datasets: GATECDA43, MINIMDA44, AMHMDA39, VGAMF36, CGHCN45 ,HFHLMDA46, and MGADAE35.

GATECDA43 uses graph attention autoencoder (GATE) to extract the high-dimensional feature information to low dimensions, respectively, and the combination is used as an input to the completely connected layer to predict the associations between RNAs and drugs sensitivity.

MINIMDA44 constructs an integrated network through multi-source information, obtains embedding representations of miRNAs and diseases through integration of multimodal network’s higher-order neighbourhood information, and finally uses a multilayer perceptron (MLP) to predict the latent associations between miRNAs and diseases.

AMHMDA39 combines the information of multiple similarity networks constructed by extracting the attention mechanism, and then introduces supernodes to construct a heterogeneous hypergraph to enrich the node information, and learns miRNA-disease features through graph convolutional networks.

VGAMF36 integrates multiple perspectives on miRNAs and diseases through linear weighted fusion, while combining matrix decomposition and variational autoencoder to extract linear and nonlinear features of miRNAs and diseases, and then predict potential miRNA-disease associations.

CGHCN45 uses a graph convolutional network to capture initial features of miRNAs and diseases, which is combined with a hypergraph convolutive machine network to further learn complex higher-order interaction information.

HFHLMDA46 constructs hyper-edges for miRNA-disease pairs and their k most relevant neighbours to obtain a hypergraph by the nearest neighbour (KNN) method, and trains a projection matrix to predict the association scores between them.

MGADAE35 predicts the correlation between miRNAs and diseases by fusing their similarity using multi-core learning. It constructs a heterogeneous network, learns representations through graph convolution, and introduces an attention mechanism to integrate multi-layer representations.

To ensure fairness in comparing results, all methods utilize identical similarity data, encompassing miRNA semantic and Gaussian similarities, as well as disease semantic and Gaussian similarities. AMHMDA39 incorporates three modalities, while MHXGMDA uses two. To maintain consistency, we utilize miRNA and disease semantic similarities as the third modality for AMHMDA39 Single-modal models CGHCN45 and HFHLMDA46 are trained solely on the miRNA-disease semantic similarity matrix.

Experimental setup

In order to verify the generalisation ability of the model, we divided the two benchmark datasets into training (80%) and testing (20%) samples. For the training set, we employed 5-fold cross-validation (5-CV) to fine-tune model parameters and structure. During training, we set the hidden channels to 64, attention heads to 8, and epochs to 2000. We employ the Adam optimiser with an optimal learning rate of 0.01 and a weight decay rate of 0.002. Additionally, dropout of 0.5 was applied to randomly omit neurons, preventing overfitting. Evaluation metrics included AUC, PRC, F1-score, accuracy, recall, specificity, and precision. Table 1 summarizes mean values across multiple experiments on the VG-data dataset, where MHXGMDA achieved AUC and PRC scores of 0.9594 and 0.9539, respectively.

Table 1 The 5-fold cross-validation test results of MHXGMDA on VG-DATA.

Furthermore, we also performed model testing on DA-data, as shown in Table 2, the AUC and PRC reached 0.9601 and 0.9545, respectively, demonstrating its superior performance.

Table 2 The 5-fold cross-validation test results of MHXGMDA on DA-DATA.

Parameter discussion

In this study, we learn biological knowledge such as meta-paths in heterogeneous graphs through the Heterogeneous Graph Transformer (HGT) model, and the parameter num-layers regulates the number of layers in the HGT. With the goal of deeply investigating the impact of the parameter num-layers on our model performance, we set different values with the search range of 2, 4, 6, 8, and 10, and tested them on two benchmark datasets, the results are shown in Fig. 4. In general, increasing HGT layers can gradually abstract higher-level feature representations and extract more biological information, but as the number of layers increases, the gradient may gradually disappear or explode during the backpropagation process, resulting in a model that is difficult to train or unstable to train. Eventually, we found that the model performance reaches the best when num-layers is set to 6 on VG-data, therefore, we set num-layers to 6 in all other experiments on VG-data. In addition, we also experimented with 5-CV on DA-data, and the performance reached the best when num-layers is set to 4, similarly, we set num-layers to 6 in all other experiments on DA-data all other experiments set num-layers to 4.

Figure 4
figure 4

Parameter analysis for num-layers. They denote the AUC and PRC values corresponding to the number of heterogeneous layers of 2, 4, 6, 8 and 10 under the two benchmark datasets, respectively. As the data value increases on each axis, the data point moves further away from the center point.

Classifier selection

For the sake of selecting the best classifier adapted to the MHXGMDA model framework during the decoding phase, this section adjusts the relevant parameters of seven machine learning models, including XGBoost, SVM, Random Forest, KNN, Decision Tree, Logistic Regression, and Plain Bayes, to evaluate the performance on 5-CV. On VG-data, our model obtains a comparably high AUC value of up to 0.9594 when XGBoost is used as a classifier, while at the same time, the rest of the evaluated metrics reach high levels compared to other classifiers. We also performed a 5-fold cross-validation on DA-data, and the highest AUC value for XGBoost is 0.9601, which is 0.0066 higher compared to SVM with the second highest score. The average experimental results are shown in Tables 3 and 4, where SVM stands for Support Vector Machine, RF for Random Forest, KNN for K-Nearest Neighbors, LR for Logistic Regression, DT for Decision Tree, and NB for Naive Bayesian.

Table 3 The evaluation indicators of MHXGMDA with different classifiers on VG-DATA.
Table 4 The evaluation indicators of MHXGMDA with different classifiers on DA-DATA.

In conclusion, XGBoost’s ability to extract information from miRNA-disease splicing features on our model outperforms other classifiers, and therefore, we chose XGBoost as the best classifier in the MHXGMDA framework.

Comparative analysis of performance with other models

We compare MHXGMDA with seven other state-of-the-art models on two benchmark datasets, for all experimental setups with 5-CV training. Figure 5 shows the AUC, PRC for each model.

Figure 5
figure 5

ROC curves and PRC curves plotted by utilizing the cross validation results of different models. ROC curve represents Receiver Operating Characteristic Curve, PRC curve represents Precision-Recall Curve.

Tables 5 and 6 show the average AUC, PRC, F1-score, accuracy, recall, specificity, precision, and running time per epoch for each model. It can be observed that although MHXGMDA fails to outperform the MGADAE method in terms of specificity, precision, and running time, it shows better performance in all other metrics, with average AUCs of 0.0138 and 0.0106 higher than the MGADAE method on VG-data and DA-data, respectively. Compared with the other seven methods in multiple cross-validations, MHXGMDA attained the highest AUC and AUPR values, validating its superiority in association discovery compared to other methods.

Table 5 Performance of MHXGMDA with other seven models on VG-DATA.
Table 6 Performance of MHXGMDA with other seven models on DA-DATA.

Ablation experiments with different network architectures

To verify the effectiveness of heterogeneous graph representation encoding in the MHXGMDA model framework, we propose three model variants, MHXGMDA-w/o Last, MHXGMDA-used HAN, and MHXGMDA-w/o Linear, in which we validate the roles of the one-dimensional splicing network layer, the heterogeneous graph Transformer, and the linear layer, respectively. Among them, MHXGMDA-w/o Last is the model that excludes the one-dimensional splicing network layer, MHXGMDA-used HAN replaces the Heterogeneous Graph Transformer model (HGT) with the Heterogeneous Graph Attention Network model (HAN), and MHXGMDA-wo Linear refers to the removal of the linear layer in the feed-forward neural network that precedes the multi-layer heterogeneous encoder. As shown in Fig. 6, Tables 7 and 8, MHXGMDA outperforms the other three variants of the model, with AUCs of 0.8886, 0.9212, and 0.9135 for the variants on VG-data, and 0.9578, 0.9595, and 0.9572 on DA-data, respectively. In addition, in all of the evaluated metrics, MHXGMDA-used HAN can significantly outperform MHXGMDA-w/o Last, which indicates that the one-dimensional splicing network layer has the ability to fully learn the node representations, while the multi-layer heterogeneous graph Transformer can further enhance the model performance.

Figure 6
figure 6

Ablation experience results on different network architectures of MHXGMDA. Mean denotes the mean, STD denotes the standard deviation, the vertical axis denotes the corresponding values of the evaluation indicators under each variant.

Table 7 Ablation experiment results on different network architectures of MHXGMDA on VG-DATA.
Table 8 Ablation experiment results on different network architectures of MHXGMDA on DA-DATA.

Ablation experiments with different views

In order to access the rationality of including multimodal training data in MHXGMDA, we implemented two variants of the model ignoring multiple modalities, MHXGMDA-w/o SS and MHXGMDA-w/o GS. Specifically, MHXGMDA-w/o SS is trained without the miRNA semantic similarity matrix, the disease semantic similarity matrix. While the training data of MHXGMDA-w/o GS only excludes miRNA, disease Gaussian similarity matrix. The experimental results are shown in Fig. 7, Tables 9 and 10. On both datasets, almost all the metrics tested by the MHXGMDA model are significantly better than the single-modal variants MHXGMDA-w/o SS and MHXGMDA-w/o GS, which implies that combining the multimodal data is significant to the prediction of miRNA-disease relationships.

Figure 7
figure 7

Ablation experience results on different views of MHXGMDA. The figure shows four indicator values of variant models from different views under two benchmark datasets, respectively, where the train data of w/o SS only includes Gaussian similarity matrix, w/o GS only includes semantic similarity matrix, and Ours includes both.

Table 9 Ablation experiment results on different views of MHXGMDA on VG-DATA.
Table 10 Ablation experiment results on different views of MHXGMDA on DA-DATA.

Case study

To evaluate the accuracy of MHXGMDA in predicting miRNA-disease associations in real cases, we chose three different diseases: Lung Neoplasms, Carcinoma, Hepatocellular and Glioblastoma as case study subjects. Firstly, we deleted all miRNAs associated with the above three diseases during training, Subsequently, the model’s ability to recover deleted associations during the prediction process is evaluated. Then, we ranked the association scores of the three disease-related miRNAs predicted by MHXGMDA and chose the top 20 miRNAs. For the sake of simplicity, we abbreviated HMDD v4.0 as ‘H4’ in Table 11.

Table 11 TOP 20 miRNA-disease resistance associations predicted by MHXGMDA.

Numerous studies have demonstrated a close association between alterations in miRNA expression levels and the progression of diverse diseases. One of these diseases is lung tumours, one of the common malignant tumours, and the co-expression of hsa-miR-182 and hsa-miR-126 helps to differentiate between primary lung tumours and lung metastases47. The second group of diseases in the case study is hepatocellular carcinoma, one of the deadliest forms of cancer in the world, and it has been found that decreased levels of miR-16 and miR-199a expression in the serum of patients exhibit a robust linkage with the progression of hepatocellular carcinoma48. In addition, glioblastoma is one of the most common types of fatal brain tumours, where the tumour compresses, infiltrates, and destroys brain tissue, leading to local symptoms and neurological impairment. It has been shown that the microRNA-302-367 cluster effectively leads to the destruction of glioma initiating cells and their tumorigenic properties49. Experiments showed that nearly all possible associations forecasted by the model could be verified, which sufficiently demonstrated the excellent performance and reliability of MHXGMDA in actually exploring miRNA-disease associations.

Finally, we focused on the three diseases mentioned above and used them as the central nodes to construct the network by carefully selecting the miRNAs that ranked in the top ten of their respective scores. As shown in Fig. 8, the finding that lung tumours and glioblastoma exhibited the highest number of identical miRNAs in the top ten scores is quite striking. However, it is more noteworthy that despite their significant commonalities at the miRNA level, lung tumours and glioblastomas show relatively little similarity in disease characteristics, and thus the same miRNAs may play different roles in different diseases.

Figure 8
figure 8

miRNA-disease association subnetwork. The red nodes represent the three diseases and the blue nodes represent the top10 miRNAs associated with the diseases.

Conclusion

In this work, we propose MHXGMDA, a computational method based on multi-layer heterogeneous encoder—machine learning decoder structure for miRNA-disease association prediction. Compared with existing prediction models, MHXGMDA not only captures different types of biometric knowledge from multi-view similarity of miRNAs and diseases, but also incorporates a multi-layer heterogeneous graph Transformer at the encoding stage to explore the dynamic information of miRNA-disease associations. In the decoding process, MHXGMDA applies XGBoost to learn miRNA-disease key features from multi-layer HGTs to deeply fuse the embedding information. Finally, we tested the model experimentally on two association datasets. The experiments demonstrate that our model surpasses state-of-the-art methods in exploring overlooked miRNA-disease associations, validating the proficiency of MHXGMDA in identifying miRNA-disease associations and helping to pioneer new disease diagnosis and treatment options. However, the fact that negative samples with sufficient experimental evidence of weak correlation between miRNAs and diseases are difficult to collect, and the random sampling of negative samples in the MHXGMDA dataset may have unintended negative impacts, and in our future work, we will construct a balanced dataset with more reliable negative samples to optimise the prediction of miRNA-disease associations.

Key Points

  • We construct a multi-layer heterogeneous graph Transformer model based on similarity matrices from multi-views, covering miRNA semantic similarity, disease similarity, spectral kernel similarity of miRNA Gaussian interactions, and spectral kernel similarity of disease Gaussian interactions. In the heterogeneous graph Transformer, we traverse different meta-paths with miRNAs and diseases as nodes to capture richer dynamic information.

  • In order to fully aggregate miRNA-disease embedding features, we spliced all the representational matrices of the multilayer HGT outputs as inputs to the XGBoost machine learning model, making maximum use of the encoding-decoding process information.

  • We applied MHXGMDA to compute the association scores of missing miRNAs with diseases at 5-CV. The findings reveal that, in comparison to other advanced methods, our method provides a more promising approach to predict the association between miRNAs and diseases.