Background

Proteins are essential macromolecules in living organisms, and the majority of proteins fold into specific and ordered three-dimensional conformations to perform their functions. Intrinsically disordered proteins and regions (IDPs/IDRs) are a special class of proteins or regions that exist without stable fold structures under native physiologic conditions. Despite lacking well-defined tertiary structures, IDPs/IDRs play essential roles in a wide range of biological processes, such as cell signaling [1], DNA regulation [2], and post-translational modification [3]. IDP/IDRs are also associated with many human diseases [4], including neurodegenerative disease [5, 6], diabetes [7], cancer [1, 8], and cardiovascular disease [9, 10]. The flexibility of IDRs in their structures enables them to bind many molecular ligands, thus making them effective drug targets [11]. Therefore, identifying disordered regions in proteins and understanding their functional roles will contribute to rational drug design and improve the efficiency of new drug development [12, 13].

Experimental characterization of IDPs/IDRs in the wet lab is expensive and labor-intensive. With the massive growth in the number of protein sequences available in databases [14], computationally predicting IDP/IDRs directly from sequences is considered a feasible approach. Numerous computational methods by leveraging different sequence features and computing techniques have been developed for identifying IDRs in proteins, such as SPOT-disorder [15], DISOPRED3 [16], SPINE-D [17], AUCpreD [18], IDP-Seq2seq [19], SPOT-Disorder2 [20], and fIDPnn. Their predictive qualities were comprehensively evaluated by a community-driven Critical Assessment of protein Intrinsic Disorder (CAID) [21]. The first edition of CAID (CAID1) [21] evaluated a total of 32 disorder predictors, and the second round of CAID (CAID2) [22, 23] was recently completed and involved the evaluations of a total of 46 different computational methods.

IDP/IDRs perform multiple critical functions in living organisms [24]. These functions can be broadly classified into two categories: the binding functions that arise from interacting with partners and the non-binding functions that originate from their native structural flexibility [24, 25]. Many computational predictive methods have been developed focusing on binding regions in IDRs, including methods for identifying protein-binding sites [26,27,28,29,30,31], DNA-binding sites [26, 29, 30], RNA-binding sites [26, 29, 30], and lipid-binding sites [32]. There are several predictors [33,34,35] available for identifying the molecular recognition features (MoRFs) within IDRs, which are disordered regions that bind to target protein domains in a process known as disorder-to-order transition. Linker serves as the primary function of the non-binding category, playing a critical role in linking multiple structured domains and permitting domain movements between catalytic sites [36, 37]. Methods [38,39,40] for identifying disordered flexible linkers (DFLs) from protein sequences have been developed. Besides, a single IDR in proteins is able to bind with different ligands to perform multiple functions, and several prediction tools such as DisoRDPbind [29] and DeepDISOBind [30] have been designed to provide predictions for multiple types of disordered binding regions, including IDRs involved in protein binding, DNA binding, and RNA binding. fIDPnn [26] is an available method for predicting both the binding and non-binding functions of IDRs. Due to the previous efforts in disorder functional prediction, the CAID1 has included the assessment of disordered binding regions [21], and the recent CAID2 has extended the evaluation to the prediction of disordered linkers [22, 23]. As the results indicated by CAID, there still exists substantial room for improvement in the current predictors. (1) Insufficient coverage of functional predictions: IDRs perform multiple functions, and predictors covering more functional categories are required. (2) The multiple functions of intrinsically disordered proteins/regions are dependent and interrelated; the current methods do not take into account the functional correlations, leading to low predictive accuracy.

The biological sequences and natural languages share three hierarchical levels of similarities. (1) Genetic similarity: the language ability in biological organisms, including humans, is involved in specific genes [41]. Both the origin of language and the evolution of biological species stem from genetic inheritance and variation. (2) Evolutionary similarity: biological organisms and natural languages share similar mechanisms of evolution. Natural language is an exclusive characteristic of human beings, and both the development of language and the evolution of species are directed by natural selection [42]. (3) Formal similarity: biological sequences exhibit similar arrangement rules and combination patterns to those observed in natural languages [43], for example, the frequency of occurrence of words in language and domains in proteome following the same form of Zipf’s law [44]. These similarities fundamentally ensure the efficacy of applying natural language processing (NLP) techniques in the analysis of biological sequences [45,46,47]. The protein language model (PLM) stands out as one of the most representative approaches [48,49,50]. Its capability to capture semantic information of protein sequence, structure, and function [51] has demonstrated significant potential in a series of studies, including protein design [52,53,54] and protein function prediction [55]. In this study, we investigated how to incorporate the protein semantic knowledge to facilitate computational predictions of disordered regions and their functions.

Here, we describe a computational method for jointly predicting disorder and multiple disordered functions, termed DisoFLAG. The DisoFLAG employs a graph-based interaction protein language model (GiPLM) to provide six functional predictions for the intrinsic disorder, including protein-binding, DNA-binding, RNA-binding, ion-binding, lipid-binding, and flexible linker (see Fig. 1a). The GiPLM integrates the protein semantic information obtained from pre-trained protein language model into graph-based interaction units. The graph-based interaction unit models the multiple disordered functions as a graph to learn the semantic correlation features among different disordered functions. Then, the propensity scores for disorder and six functions were calculated based on the semantic correlation features aggregated by the graph convolutional network (GCN) layer (see Fig. 1b). Following the CAID, we performed evaluations of DisoFLAG on the CAID2 dataset and two independent test datasets built from the latest DisProt database. The evaluation results demonstrated that DisoFLAG achieves relatively higher performance in predicting disorder and disordered functions. We provide a standalone package and a convenient web server for DisoFLAG.

Fig. 1
figure 1

Schematic overview of DisoFLAG. a DisoFLAG provides predictions of six functions for intrinsically disordered regions in proteins. Joint prediction of the six functional regions results in a lower information entropy compared to individual prediction. The reduction in information entropy is known as information gain (IG), which reflects the correlation between different functions. High IG, strong correlation. b The graph-based interaction protein language model (GiPLM) architecture employed in DisoFLAG. The bi-directional gated recurrent unit (Bi-GRU) layer is used to capture the protein contextual semantic information based on the residue embeddings extracted from the pre-trained protein language model. The subsequent attention-based gated recurrent unit (GRU) layer is used to model the global correlation among sequences and produces a hidden representation for each residue. The feature mapping layers are used to generate six different function embedding vectors (Xi) for each residue. Subsequently, for each residue, the graph-based interaction unit models six functions and their correlations as a functional graph, utilizing function embedding vectors (Xi) as node representations and pre-calculated IG matrix as the weighted adjacency matrix for graph edges. Finally, the propensity scores for disorder and six disordered functions were calculated based on the semantic correlation features aggregated on the functional graph by the graph convolutional network (GCN) layer

Methods

Benchmark dataset of disorder functions

The DisProt [56,57,58] database provided the functional annotations of intrinsically disordered protein/region (IDP/IDR) following the Intrinsically Disordered Proteins Ontology (IDPO) and the Gene Ontology (GO) schemas. We investigated all the ontology terms in DisProt and obtained functional annotation term collections for protein-, DNA-, RNA-, ion-, and lipid-binding and flexible linker (Additional file 1: Table S1). Following previous studies [21, 29, 59], we annotated each functional class by collecting all the sub-terms. We extracted all the functionally annotated proteins from the DisProt9.3 database. To obtain high-quality data, we removed sequences whose functional regions lacked annotations for disordered structure. We also excluded the DP00072 sequence that was too long (> 30,000 residues) to be processed by the protein language model. Subsequently, a total of 925 sequences were obtained and used for splitting the training, validation, and independent test datasets. Following the same protocols as previous studies [26, 32], we clustered the 925 sequences using the CD-HIT algorithm [60] with 25% sequence similarity. Then, we randomly divided the clusters into five subsets, where three subsets (including 589 sequences) were used as the training dataset, and one subset (including 148 sequences) was used as the validation dataset. The remaining subset with 188 sequences was used as the independent test dataset, namely DP93. To further evaluate the performance of the proposed predictor, we collected an additional independent test dataset (DP94) containing 98 sequences using the same protocol as aforementioned. The sequences of DP94 are collected from the newly updated proteins in versions 9.3 to 9.4 of the DisProt database. The statistical information of these datasets is shown in Additional file 1: Table S2.

Graph-based interaction protein language model

Motivated by the language models (LMs) in natural language processing (NLP) [61, 62], the protein language models (PLMs) pre-trained with large numbers of amino acid sequences are able to discover the basic principles contained in the sequences [63]. Studies [55, 64, 65] have demonstrated that applying protein semantic information extracted from PLMs can facilitate the performance improvement of various prediction tasks. In DisoFLAG, we employed a graph-based interaction protein language model (GiPLM) to provide six functional predictions for intrinsically disordered regions (see Fig. 1b). The GiPLM integrates protein semantic information extracted from the ProtT5 [64] protein language model into graph-based interaction units to enhance the semantic correlation of multiple disordered functions. Specifically, a bidirectional gated recurrent neural network (Bi-GRU) [66] layer is employed to capture the protein contextual semantic encodings \(\mathbf{P}\) based on the embeddings extracted from the ProtT5:

$$\mathbf{P}={\text{BiGRU}}({\mathbf{r}}_{1},{\mathbf{r}}_{2},\cdots ,{\mathbf{r}}_{L}]$$
(1)

where \({\mathbf{r}}_{i}\) is the PLM embedding vector for the ith residue, and L represents the length of the input sequence. Subsequently, the gate recurrent unit (GRU) layer with an attention mechanism [19, 67] was utilized to capture the global correlations among sequences and output the hidden representation \({\mathbf{h}}_{i}\) for each residue:

$${\mathbf{h}}_{i}={\text{GRU}}({\mathbf{h}}_{i-1},{\mathbf{c}}_{i})$$
(2)
$${\mathbf{c}}_{i}={\sum }_{j=1}^{l}{\alpha }_{ij}{\mathbf{p}}_{{\text{j}}}$$
(3)
$${\alpha }_{ij}=\frac{{\text{exp}}({s}_{ij})}{{\sum }_{j=1}^{l}{{\text{exp}}(s}_{ij})}$$
(4)
$${s}_{ij}={\mathbf{h}}_{i-1}^{T}{\mathbf{W}}_{a}{\mathbf{p}}_{{\text{j}}}$$
(5)

where \({\mathbf{p}}_{{\text{j}}}\in \mathbf{P}\) indicates the semantic encodings of the jth residue, \({\mathbf{W}}_{a}\) is the trainable weights of the attention mechanism, \({s}_{ij}\) the attention score between the ith and the jth residues, \({\alpha }_{ij}\) represents the attention weight between the ith and the jth residues, and \({\mathbf{c}}_{i}\) indicates the attention-based contextual representations. Then, the feature mapping layers were used to generate the functional semantic representations (X) for each residue. Specifically, six fully connected layers were employed for mapping the hidden global correlation representation \({\mathbf{h}}_{i}\) as functional semantic representations:

$${\mathbf{X}}_{i}^{(n)}={\text{ReLU}}\left({\mathbf{h}}_{i}{\mathbf{W}}^{\left(n\right)}+{\mathbf{b}}^{\left(n\right)}\right)$$
(6)

where \({\mathbf{X}}_{i}^{(n)}\) is the nth functional semantic representation for the ith residue and \({\mathbf{W}}^{\left(n\right)}\) and \({\mathbf{b}}^{(n)}\) are weights and bias variables, respectively; ReLU is the nonlinear activation function.

A single disordered region can bind to different ligands to perform multiple functions, and the multiple functions of IDRs are dependent and interrelated. In this study, we used the Shannon information entropy (IE) [68] and information gain (IG) [69] to describe the correlations among different functions:

$${IG}_{XY}={H}_{X\cup Y}- {H}_{XY }(0\le IG<1)$$
(7)

where \({H}_{X\cup Y}\) and \({H}_{XY}\) represent the information required for individual prediction and joint prediction of X and Y functions, respectively [68]:

$${H}_{X\cup Y}=-{\sum }_{i\in X\cup Y}p\left(i\right){{\text{log}}}_{2}p(i)$$
(8)
$${H}_{XY}=-{\sum }_{i\in X}{\sum }_{j\in Y}p\left(ij\right){{\text{log}}}_{2}p\left(ij\right)$$
(9)

A higher IG value indicates more reductions of IE in the joint prediction of two functions and a stronger correlation between the two functions. We pre-calculated the IG values on the training dataset and obtained the IG matrix of six disordered functions, which is visualized in Additional file 1: Fig. S1.

Then, each graph-based interaction unit in GiPLM models six disordered functions and their correlations as a functional graph G = (V, E), where V and E represent nodes and edges, respectively. The functional graph is fully connected (see Fig. 1b). Each node represents a function and is represented by functional semantic representation \({\mathbf{X}}^{\left(i\right)}\). Edges represent the correlations between functions and are represented by the adjacency matrix. In GiPLM, we employed a trainable weighted adjacency matrix to represent the degree of correlation between different functions and used the IG matrix pre-calculated on the training dataset by formula (7) as the initialization value:

$${A}_{ij}={IG}_{ij}$$
(10)

Then, the graph convolutional network (GCN) layer was used to propagate and aggregate the semantic correlation features for each node on the functional graph [70]:

$${\mathbf{Y}}_{i}^{(n)}={\text{ReLU}}({\sum }_{j\in {N}_{i}}{\mathbf{A}}_{ij}{\mathbf{W}}^{\left(n\right){\prime}}{\mathbf{X}}_{i}+{\mathbf{b}}^{\left(n\right){\prime}})$$
(11)

where \({\mathbf{Y}}_{i}^{(n)}\) is the aggregated semantic feature of the nth functional node, \(\mathbf{A}\) is the trainable weighted adjacency matrix of the edges, \({\mathbf{X}}_{i}=[{\mathbf{X}}_{i}^{1},{\mathbf{X}}_{i}^{2},\cdots ,{\mathbf{X}}_{i}^{6}]\) is the concatenation of the six functional semantic representations, \({\mathbf{W}}^{\left(n\right){\prime}}\) is the convolution kernel, and ReLU is the nonlinear activation function. The semantic feature of the disorder \({\mathbf{Y}}_{i}^{IDR}\) was obtained by performing global max pooling over the functional graph (F represents the dimension of node features) [71]:

$${\mathbf{Y}}_{i}^{{\text{IDR}}}={{\text{max}}}_{k\in F}([{\mathbf{Y}}_{i}^{\left(1\right)},\cdots {\mathbf{Y}}_{i}^{\left(n\right)}\cdots ,{\mathbf{Y}}_{i}^{\left(6\right)}])$$
(12)

Finally, the propensity scores for disorder and six disordered functions were calculated based on the functional node features \({\mathbf{Y}}_{i}^{\left(1\right)\sim (6)}\) and disordered features \({\mathbf{Y}}_{i}^{{\text{IDR}}}\) by seven fully connected layers with the Sigmoid activation functions [32, 59].

Model training and evaluation

To train the GiPLM model of DisoFLAG to predict disorder and disordered functions for proteins, we employed the binary cross-entropy loss function to calculate the loss value of each prediction, and their combination was used as the final loss L(\(\theta\)) [72]:

$$L(\theta )=-\frac{1}{n+1}{\sum }_{i=1}^{n+1}[{y}_{i}{\text{log}}\left(\widehat{{y}_{i}}\right)+(1-{y}_{i}){\text{log}}(1-\widehat{{y}_{i}})]$$
(13)

where \({y}_{i}\) (1 or 0) and \(\widehat{{y}_{i}}\) represent the trues and predicted propensity score of the ith function, respectively. All the model variables and hyper-parameters were optimized according to the minimum loss function values on the validation dataset. A detailed description of all the trainable parameters and hyper-parameters of DisoFLAG was given in Additional file 1: Table S3.

The DisoFLAG outputs the real-valued propensity score results for disorder and disordered functions. We evaluated the predictive performances of DisoFLAG and other comparative methods with threshold-independent metrics [73,74,75,76,77]: AUC (area under the true-positive rates and false-positive rates curve across all thresholds), AUPR (area under the precisions and recalls curve across all thresholds), APS (average precision score along the precision-recall curve), and \({F}_{{\text{max}}}\) (the maximum harmonic mean between precision and recall rate across all thresholds). In addition, given a threshold, the binary results can be converted from the real-valued results (residue is predicted to be disordered/functional if its propensity score is higher than the threshold; otherwise, it is predicted as ordered/non-functional). We used the Matthews correlation coefficient (MCC) and balanced accuracy (BACC) to evaluate the binary prediction results. The definitions of the evaluation metrics are given in Additional file 1: Table S4.

Results and discussion

Protein semantic information facilitates the prediction of intrinsic disorder and disordered function

Protein feature representation is an essential step in DisoFLAG. We evaluated the performance of DisoFLAG using different protein representations, including protein language model-based (PLM) features (ProtT5 and ProtBERT), the position-specific scoring matrix (PSSM), and amino acid one-hot encodings (One-hot). Models taking different feature inputs were trained and optimized following the same protocol as described in the “Methods” section. The evaluation results on the DP93 independent test dataset and corresponding ranking results are shown in Fig. 2a and Additional file 1: Table S5, respectively. From these results, we can see that the model using PLM-based features outperformed PSSM and One-hot, and the model using ProtT5 consistently achieved the highest performance in predicting disorder and disordered functions. To further investigate the model performance improvement by the PLM-based features, we calculated the AUC values of DisoFLAG on the sequences with different multiple sequence alignment (MSA) [78, 79] depths (see Fig. 2b–e). Specifically, for each sequence in the DP93 dataset, we employed the HHblits [80] tools to conduct homology searches against the UniProt database and grouped the sequences according to the number of rows in the MSA search results. The results on disorder (Fig. 2b) and disordered functions (Fig. 2c–e) demonstrated that the performance of the model using the protein language model encodings improved the most as the number of sequence homologous alignments (i.e., MSA depth) increased. When the MSA depth is relatively small, the PSSM encoding method has better results than the protein language model coding method. The possible reasons for these results were attributed to the following: (1) The PLM as a data-driven deep learning method can accurately capture sequence features only when there is a sufficient number of homologous sequences available. (2) In contrast, PSSM encoding based on a probabilistic statistical model is more effective in capturing sequence features under a lower MSA depth condition. (3) The features captured by PSSM encoding and PLM are different. PSSM is designed to encode sequence conservation information, while PLM learns the contextual semantic information of protein sequences. Therefore, the conservation information is more accurate than the semantic information in predicting disordered functions when there are fewer homologous sequences.

Fig. 2
figure 2

Performance of DisoFLAG in predicting disorder and disordered functions using different feature representations. a AUC value comparisons of DisoFLAG using different features, including protein language model-based features (ProtT5 and ProtBERT) and classic protein feature representations by position-specific scoring matrix (PSSM) and amino acid one-hot encodings (One-hot). The performance of DisoFLAG in predicting disorder (b) and disordered functions (ce) for sequences with different multiple sequence alignment (MSA) depths

Graph-based interaction unit enhances the semantic correlations of multiple disordered functions

The graph-based interaction unit (GiU) in DisoFLAG was employed to establish the correlations among multiple disordered functions. To investigate the critical role of GiU in DisoFLAG, we compared the performance of DisoFLAG using GiU with a simple sequence layer (Seq) (see Table 1). From this table, we found that DisoFLAG using GiU consistently outperformed the Seq, indicating that the semantic correlation features captured by GiU significantly boosted the predictive performance of DisoFLAG. In addition, the correlation among different disordered functions leads to one disordered residue being able to perform two or more different functions, which is referred to as the multifunctional (MF) residue. We compared DisoFLAG with other methods for predicting MF residues on the DP93 test dataset. A multifunctional residue is considered correctly predicted only if all its functions are accurately predicted. The Fmax evaluation results of different methods are shown in Fig. 3a, from which we found that there are six types of MF residues in the DP93 dataset. DisoFLAG is the only predictor that can predict all types of MF residues. Additionally, compared to other predictors, DisoFLAG considered correlations among different functions and achieved the highest Fmax values, which again indicated the importance of functional correlations captured by GiU for the accurate prediction of disordered functions.

Table 1 Performance comparisons of DisoFLAG using graph-based interaction unit (GiU) and sequence layer (Seq) for predicting different disordered functions on the DP93 independent test dataset
Fig. 3
figure 3

Functional correlations contribute to the prediction of disordered functions. a Performance comparison of different predictors on multifunctional residues, “/” represents the predictor failed to process this subset of residues. The information gain (IG) values calculated on the DP93 test dataset (b), and their contributions (c) to the prediction of different functions calculated by layer-wise relevance propagation (LRP)

Furthermore, we used layer-wise relevance propagation (LRP) [81, 82] to investigate the contributions of functional correlations to the prediction results. The LRP score was calculated as follows:

$${{\text{R}}}_{j}^{(l)}=(\alpha \frac{{w}_{jk}^{+}{h}_{j}}{{\sum }_{j}{w}_{jk}^{+}{h}_{j}+{b}_{k}^{+}}-\beta \frac{{w}_{jk}^{-}{h}_{j}}{{\sum }_{j}{w}_{jk}^{-}{h}_{j}+{b}_{k}^{-}}){{\text{R}}}_{k}^{(l+1)}$$
(14)

where \({{\text{R}}}_{j}^{(l)}\) and \({{\text{R}}}_{k}^{(l+1)}\) are the relevance scores of the current and previous layers, respectively. \(\alpha\) and \(\beta\) are the constraint parameters of the \(\alpha \beta\) rule in LRP; \({w}_{jk}\), \({b}_{k}\), and \({h}_{j}\) represent the weights, bias, and hidden vector, respectively. We performed the LRP on the graph-based interaction units to obtain the importance of functional correlations. For each function, the importance scores of functional correlations to the propensity score were calculated by summing the relevance scores of all true-positive propensity predictions on the DP93 test dataset, which were described in Eq. (14). Figure 3b shows the IG values calculated on the DP93 dataset, which reflected the correlation among different functions. These correlations consistently made a positive contribution to the prediction of six disordered functions (see Fig. 3c).

Comparison of DisoFLAG to other methods in the prediction of disordered functions

We evaluated the performance of DisoFLAG for predicting disordered functions and compared it with methods specifically designed for disordered functions and performed well on CAID2. These methods include DisoRDPbind [29] and DeepDISOBind [30] for predicting protein-, DNA-, and RNA-binding disordered regions (IDRs); fIDPnn [26] method for predicting protein-, DNA-, and RNA-binding IDRs and disordered linkers; ANCHOR-2 [27] for predicting protein-binding IDRs; MoRFchibi (Light and Web) [34] and SPOT-MoRF [33] are methods for identifying molecular recognition features (MoRFs), which are protein-binding IDRs that undergo a disorder-to-order conformational transition; DisoLipPred [32] is the only available method for predicting lipid-binding IDRs; and TransDFL [39] and DFLpred [38] are methods for identification of disordered linkers. DisoFLAG is currently the only predictor providing predictions of ion-binding IDRs and covering the broadest range of disordered functional categories. The evaluation results on the DP93 test dataset suggested that the performance of DisoFLAG in predicting disordered protein-binding, DNA-binding, RNA-binding, lipid-binding, and linkers is better than the current tools quantified by AUC, MCC, and BACC metrics (Table 2). Moreover, DisoFLAG offered statistically significant improvement in AUC compared to other methods (see Additional file 1: Table S6). To further investigate the stability of the prediction performance of different methods, we performed the performance comparison on the DP94 test dataset, whose proteins were collected from the latest version 9.4 DisProt database. The results show that the performance quantified by the AUC metric of DisoFLAG is still significantly better than current tools in predicting disordered protein-binding, DNA-binding, and linkers; however, its performance decreased in predicting RNA-binding and lipid-bindings (see Additional file 1: Tables S7 and S8). We also reported the performance metrics at the protein level, as described in Additional file 1: Tables S9 and S10.

Table 2 Performance comparisons of DisoFLAG and other predictors on the DP93 independent test dataset

We compared DisoFLAG with a broad range of predictors that participated in the Critical Assessment of protein Intrinsic Disorder (CAID2) challenge. Specifically, we assessed the performance of DisoFLAG on two CAID2 test datasets: disorder-binding and disorder-linker [22, 23]. The disorder-binding dataset contains 78 proteins annotated with interaction interfaces in disordered regions, and the disorder-linker dataset contains 40 proteins with disordered flexible linkers. We comprehensively aligned the sequences in CAID2 with all the benchmark datasets used in this study and found that CAID2 sequences were completely unseen with the training and validation of DisoFLAG. This is fully consistent with the assessment process of CAID2. Therefore, it is appropriate to directly compare the results of DisoFLAG with those reported in CAID2. We assessed the performance of DisoFLAG for predicting protein-binding, DNA-binding, RNA-binding, ion-binding, and lipid-binding on the disorder-binding dataset and predicting linkers on the disorder-linker dataset. The evaluation results and comparison with the 10 top-ranking methods reported in CAID2 [22, 23] are shown in Fig. 4. In Fig. 4a, b, we can see that the DisoFLAG’s protein-binding predictor generates the highest quality predictions with AUC = 0.879 and APS = 0.563 on the disorder-binding dataset. The DisoFLAG’s linker predictor achieves AUC = 0.8 and APS = 0.197 for the prediction of disordered linkers on the disorder-linker dataset (see Fig. 4c, d). The complete metrics are listed in Additional file 1: Tables S11 and S12.

Fig. 4
figure 4

Performance comparisons of DisoFLAG and the 10 top-ranking methods in CAID2 for disordered binding and linker prediction. The receiver operating characteristic (ROC) curves on the disorder-binding and disorder-linker predictions are shown in a and c, respectively; methods are sorted by the area under the ROC cover (AUC). The precision-recall (PR) curves on the disorder-binding and disorder-linker predictions are shown in b and d, respectively; methods are sorted by the average precision score (APS); and points correspond to the Fmax values. “C” represents the coverage of prediction results

Comparison of DisoFLAG to other methods in the prediction of intrinsic disorder

We assessed the performance of DisoFLAG in predicting the intrinsic disorder of proteins on two disordered test datasets provided in CAID2: DisProt-NOX and DisProt-PDB. The DisProt-NOX dataset is composed of IDRs from the DisProt database, excluding X-ray missing residues. In contrast, the DisProt-PDB dataset is more conservative by strictly limiting negative samples to structured residues observed in the PDB database. For more detailed information about the datasets, please refer to the CAID2 [22, 23]. We performed a thorough sequence comparison of two CAID2 datasets against the benchmark dataset used in this study to ensure that all sequences were independent and unseen by the training and validation of DisoFLAG. Subsequently, we compared the performance of DisoFLAG with the top 10 ranked methods reported in CAID2 (see Fig. 5). From these results, we observed that DisoFLAG achieved a second rank with an AUC of 0.836 and a fourth rank with an APS of 0.560 on the Disorder-NOX dataset. DisoFLAG showed lower performance on the Disorder-PDB dataset, but it achieved comparable performance to the CAID2 top 10 results in terms of AUC and APS metrics. The complete metrics are in Additional file 1: Tables S13 and S14.

Fig. 5
figure 5

Performance comparisons of DisoFLAG and the 10 top-ranking methods in CAID2 for disorder prediction. The receiver operating characteristic (ROC) curves on the Disorder-NOX (210 proteins) and Disorder-PDB (348 proteins) datasets are shown in a and c, respectively, and methods are sorted by the area under ROC cover (AUC). The precision-recall (PR) curves on the Disorder-NOX and Disorder-PDB datasets are shown in b and d, respectively; methods are sorted by the average precision score (APS); and points correspond to the Fmax values. “C” represents the coverage of prediction results

Case study

We investigated the prediction results of DisoFLAG for one protein from the independent test data: the human immunodeficiency virus infectivity factor (HIV-1 Vif, DisProt: DP00875). Vif is a crucial accessory protein in HIV replication, and its role is to disrupt the antiviral activity of the human host defense factor APOBEC-3G (A3G) [83]. The functional implementation of Vif involves interactions with A3G, protein chaperones, ubiquitination machinery factors, and so on [84, 85]. Thus, elucidating the functional mechanism of Vif is of significant importance for discovering novel drugs to block its activity [85,86,87]. Nuclear magnetic resonance (NMR) revealed that the C-terminal domain (141–192) of Vif is unstructured under physiological conditions. Figure 6a shows a protein complex structure (PDB ID: 8E40) [88] composed of Vif from the PDB database [89]. Experimental evidence suggests that the disordered region of Vif is involved in binding with proteins and lipids [90]. The propensity scores for the Vif protein produced by DisoFLAG are visualized in Fig. 6b. To investigate the contribution of functional correlations to the DisoFLAG’s predictions, we mapped the highest protein-binding propensity score located at the T170 residue onto the functional graph in DisoFLAG. The mapping process achieved by LRP is shown in Fig. 6c, from which we observed that protein-binding, RNA-binding, and lipid-binding nodes made a positive contribution to the prediction, and the edge between the protein-binding node and lipid-binding node contributed the most. We further compared the binary results of protein-binding and lipid-binding predicted by DisoFLAG and other methods for the Vif protein. From the comparison results shown in Fig. 6d, e, it can be seen that DisoFLAG is the only method that can simultaneously identify the complete disordered protein-binding and lipid-binding regions of the Vif protein and has the lowest number of false-positive predictions. These results highlighted again the semantic correlations captured through the graph-based interaction protein language model (GiPLM) enabling DisoFLAG to provide accurate and comprehensive predictions of multiple disordered functions.

Fig. 6
figure 6

Prediction results of DisoFLAG for Vif protein. a Protein complex structure (PDB ID: 8E40) of Vif (colored in red), A3G (colored in blue), CBF-beta (colored in yellow), and fork RNA (colored in orange). b The propensity score results predicted by DisoFLAG for the Vif protein. c LRP of residue T170’s protein-binding propensity score on the functional graph, where the contribution scores of nodes were calculated by summing the relevance scores of node features, and the contribution score of the edge was equal to the sum of contributions of two nodes it links. The binary results of protein binding (d) and lipid binding (e) predicted by DisoFLAG and other methods for the Vif protein. The binary results were converted from the propensity scores of different methods using a threshold that achieves the maximum F1 score

Conclusions

Inspired by the similarities between biological sequences and natural language across three hierarchical levels, we designed the DisoFLAG predictor based on a graph-based interaction protein language model. DisoFLAG provides predictions of intrinsic disorder and its six types of functions, including protein-binding, DNA-binding, RNA-binding, ion-binding, lipid-binding, and flexible linkers. The performance assessments performed on two independent test datasets and CIAD2 benchmark test datasets indicated that DisoFLAG offers accurate and comprehensive predictions of disordered functions, extending the current coverage of computationally predicted disordered function categories. Our experimental analysis of the prediction results of DisoFLAG demonstrated that the use of protein semantic knowledge extracted from the pre-trained protein language model facilitated the accurate predictions of multiple disordered functions. The graph-based interaction unit used in DisoFLAG enhanced the semantic relevance of multiple disordered functions leading to a significant improvement in the identification of multifunctional disordered residues. We provide the standalone package and a convenient web server for DisoFLAG, which will be helpful tools to researchers in related fields.