Background

Circular RNAs (circRNAs) are a class of endogenous noncoding RNAs with distinct properties and diverse cellular functions, unlike the linear RNAs with 5’ and 3’ termini which reflect start and stop of the RNA polymerase on the DNA template, and are generated by back splicing (3’-5’) or lariat introns [14]. The circRNAs are not easy to be degraded by exoribonucleases because they lack free ends [5, 6]. As forming a circRNA is usually considered a rare event in cells, it was suggested that they may be considered errors of normal splicing process [4, 7]. Therefore, despite their existence in both unicellular and multicellular organisms, they have been previously even disregarded as transcriptional noise or artifacts [8]. Nevertheless, with the advances of high-throughput deep sequencing and functional genomics, the knowledge of circRNAs has recently been learned substantially [9, 10].

To date, circRNAs have been found in various tissues and cell lines of plants, animals and so on [4, 11, 12]. Some circRNAs can be translated in some tissues or translated into a protein under splicing-dependent, cap-independent manner or other certain conditions [13]. Furthermore, circRNAs are expected to have other functions independent of their host genes because they have much longer half-life than other linear RNA transcripts [10]. Many circRNAs can regulate gene expression because they have strong potential to act as miRNA sponges or decoys [14]. In addition, some circRNAs can also function as protein sponges or decoys, and the best example is that protein MBL is prevented to bind to other targets when being tethered to a circRNA [15]. CircRNA circFoxo3 can also act as a protein scaffold, which binds to sites for mouse MDM2 and p53 [16]. Unlike the above functions of circRNAs are based on the fact that they are located to the cytoplasm, some circRNAs such as exon-intron circRNAs are retained in the nucleus and they may promote with transcription [17].

Through the understanding of functions of circRNAs, many evidences have shown that circRNAs play an important role in occurrence of human complex diseases, such as cancer [18]. CircRNA ciRS-7 has significant implications for diseases through efficiently regulating the activity of miRNA miR-7 [19]. Likewise, by sponging the miR-7, miR-17 and miR-214, cir-ITCH can increase the level of ITCH which further inhibits the Wnt pathway that is frequently aberrant in cancers [20, 21]. SRY can affect the proliferation, migration and invasion of cholangiocarcinoma cells, which is the sponge of miR-138 and can strongly suppress its level [22, 23]. CircRNA-MYLK level is elevated and correlated with BC (bladder carcinoma) progression and plays an oncogenic role in BC in vitro and vivo [24]. Circ-Foxo3 was minimally expressed in patient tumor samples and in a panel of cancer cells and its expression was found to be significantly increased during the cancer cell apoptosis [16, 25]. Circular RNA MTO1 can suppress hepatocellular carcinoma progression by acting as the sponge of miR-9 [26]. In addition, the aberrant expression of circCCDC66 also is associated with a late-stage diagnosis and metastases [27].

In recent years, some databases about circRNAs have been developed to further study the function mechanism of circRNAs. CircBase is the first database about circRNAs, which merges and unifies data sets of circRNAs and provides the interface to access, download, and browse the evidence supporting their expression within the genomic context [28]. CircRNADb is a comprehensively annotated human circular RNAs database, which containes 32,914 human exonic circRNAs from diversified sources and provides the genomic information, exon splicing, genome sequence, internal ribosome entry site (IRES), open reading frame (ORF) and references of these circRNAs [29]. PlantcircBase is a database of plant circRNAs, which also provided other functions such as visualization of the structures of circRNA based on their genomic position [12]. Likewise, PlantCircNet also is a database of plant circRNAs, which has the main feature of plantCircNet to provide visualized plant circRNA-miRNA-mRNA regulatory networks and can identify metabolic effects of circRNAs [30]. ExoRBase is a web-accessible database, which provides the circRNA, lncRNA and mRNA information by RNA-seq data analyses of human blood exosomes [31]. CircNet provides tissue-specific circRNA expression profiles and circRNA-miRNA-gene regulatory networks by utilizing sequencing datasets to systematically identify the expression of circRNAs in RNA-seq samples [32]. TSTD also provides the tissue-specific circRNAs and further characterizes the functions of these circRNAs [33]. The cancer somatic mutations that alter miRNA targeting and functioning are provided by SomamiR 2.0 database which also collects the associations between miRNA and other competing endogenous RNAs such as mRNAs, circRNAs and lncRNAs [34]. The CSCD is also a cancer-specific circRNAs database which identifies the cancer-specific circRNAs by analyzed the RNA-seq samples and further predicts the miRNA response element sites and RNA binding protein sites of each circRNA [35]. Circ2Traits is the circRNA-disease associations database, which is constructed by circRNA-miRNA associations, miRNA-disease associations and disease-SNPS associations [18]. To our knowledge, CircR2Disease is the first manually curated database about circRNA-disease associations by reviewing existing literatures and provides the important foundation to study the associations of circRNAs and diseases [36].

In general, we have obtained some significant progresses in understanding features and functions of circRNAs. In addition, some databases about circRNAs have also been constructed. However, current studies of circRNA-disease associations mainly focus on biomedical experimentations that are notoriously expensive and time-consuming. Therefore, there is a very urgent need to predict circRNA-disease associations by computational methods. To our knowledge, the development of computational approach is very limited because the databases of circRNA-disease associations are incomplete. However, circR2Disease provides the chance to effectively predict novel circRNA-disease associations through developing computational methods.

In this study, we develop a novel method (call DWNN-RLS) to predict new circRNA-disease associations. Firstly, DWNN-RLS computes the Gaussian interaction profile (GIP) kernel similarities of circRNAs and diseases based on the known circRNA-disease associations. By considering their direct acyclic graph(DAG) representation, the sematic similarity of diseases is also calculated. We further obtain the final similarity of diseases with the mean of GIP similarity and sematic similarity. Then the association possibility scores of circRNA-disease pairs are predicted by Kronecker product kernel based Regularized Least Squares approach. The kernels of circRNA-disease pairs are calculated by the Kronecker product of kernels of circRNAs and diseases. Furthermore, the decreasing weight k-nearest neighbor (DWNN) method is used to calculate the initial relational scores of new circRNAs and new diseases. In order to assess the prediction performance of DWNN-RLS and compare with other competing methods, we conduct 5-fold cross validation (5CV), 10-fold cross validation (10CV) and leave-one-out cross validation (LOOCV). The experiment results demonstrate that DWNN-RLS outperforms other six competing methods (RLS-avg, RLS-Kron, NetLapRLS, KATZ, NBI, WP) in terms of AUC (area under the ROC curve) values. Specifically, the AUC values of DWNN-RLS in 5CV, 10CV and LOOCV reach 0.8854, 0.9205 and 0.9701, respectively, which are superior to the second best results (KATZ: 0.8224 and 0.8343, RLS-avg: 0.9169). Furthermore, the prediction ability of DWNN-RLS also is illustrated by the case studies.

Methods

Materials

In this study, we download the known circRNA-disease associations data from the CircR2Disease database (http://bioinfo.snnu.edu.cn/CircR2Disease/). These circRNA-disease associations were curated circRNA-disease associations from the existing literature prior to 31 March 2018. After removing the duplicated data, we obtain the benchmark dataset that includes 725 circRNA-disease associations, 676 circRNAs and 100 diseases. In addition, the Mesh database [37] (https://www.nlm.nih.gov/bsd/disted/meshtutorial/themeshdatabase/) is used to compute the sematic similarity of diseases.

Similarity of circRNAs

As the successful application of GIP kernel similarity in other relative areas [3842], we also use it to calculate the similarities of circRNAs. The GIP kernel was computed from the known circRNA-disease associations. Let \(C=\left \{c_{1},c_{2},...,c_{N_{c}}\right \}\) be the set of Nc circRNAs and \(D=\left \{d_{1},d_{2},...,d_{N_{d}}\right \}\) be the set of Nd diseases. Let matrix \(\phantom {\dot {i}\!}Y \in R^{N_{c} \times N_{d}}\) represents known circRNA-disease associations, in which the value of yij is 1 if circRNA i and disease j exists a known association, otherwise 0. Then the GIP similarity of circRNA ci and circRNA cj can be computed as follows:

$$\begin{array}{@{}rcl@{}} S_{c}\left(c_{i},c_{j}\right)= G_{c}\left(c_{i},c_{j}\right) = exp\left(-\gamma_{c} {||y_{c_{i}}-y_{c_{j}}||}^{2}\right) \end{array} $$
(1)
$$ \gamma_{c} = 1 /\left(\frac{1}{N_{c}}\sum\limits_{i=1}^{N_{c}}{||y_{c_{i}}||}^{2}\right), $$
(2)

where \(y_{c_{i}}=\left \{y_{i1},y_{i2},...,y_{{i}{N_{d}}}\right \}\) and \(y_{c_{j}}=\left \{y_{j1},y_{j2},...,y_{{j}{N_{d}}}\right \}\) are the association profiles of circRNA ci and circRNA cj, respectively. Since the GIP kernel is computed by a decaying function of the distance between the vectors, this function is of the form of a bell-shaped curve. In addition, since a larger value of γc yields a narrower bell while a smaller value of γc yields a wider bell, the parameter γc can be used to regulate the bandwidth of kernel. In this study, parameter γc is computed as the reciprocal of average number of associations per circRNA.

Similarity of diseases

Firstly, we also compute the GIP similarity of disease di and disease dj as follows:

$$\begin{array}{@{}rcl@{}} G_{d}\left(d_{i},d_{j}\right) = exp\left(-\gamma_{d} {||y_{d_{i}}-y_{d_{j}}||}^{2}\right) \end{array} $$
(3)
$$ \gamma_{d} = 1 /\left(\frac{1}{N_{d}}\sum\limits_{i=1}^{N_{d}}{||y_{d_{i}}||}^{2}\right), $$
(4)

where \(y_{d_{i}}=\left \{y_{1i},y_{2i},...,y_{{N_{c}}{i}}\right \}^{T}\) is the association profiles of disease di while \(y_{d_{j}}=\left \{y_{1j},y_{2j},...,y_{{N_{c}}{j}}\right \}^{T}\) is the association profiles of disease dj. In addition, the parameter γd is used to regulate the bandwidth of kernel.

Secondly, we use the Mesh descriptions of diseases to compute the sematic similarity. Specifically, for disease A which can be represented by a DAG (DAGA,DAGA=TA,EA) in mesh database. Set TA includes the parent diseases nodes of A and itself while set EA includes the direct edges between disease nodes within TA. The similarity of diseases A and B can be calculated as follows:

$$\begin{array}{@{}rcl@{}} {D_{semsim}(A,B)} = \frac{\sum\limits_{t \in {T_{A}}\cap{T_{B}}}\left({SV}_{A}(t)+{SV}_{B}(t)\right)}{Sem(A)+Sem(B)}, \end{array} $$
(5)

where SVA(t)(SVB(t)) is the sematic value between disease A(B) and t which is the all common ancestors of diseases A and B. In addition, Sem(A) and Sem(B) are the sematic values of diseases A and B, respectively. For disease A, the Sem(A) and SVA(t) can be calculated as follows:

$$\begin{array}{@{}rcl@{}} {Sem(A)} = {\sum\limits_{t \in {T_{A}}}{SV}_{A}(t)}, \end{array} $$
(6)
$$\begin{array}{@{}rcl@{}} {{SV}_{A}(t)} \,=\,\! \left\{ \begin{aligned} \!\!1&, t=A \\ \Delta^{w} &, t=\!the~ smallest~ w~ layer\ ancestor~ node~ of~ A \\ \end{aligned} \right. \end{array} $$
(7)

where Δ is the layer contribution factor between disease node and its direct ancestor disease nodes in DAG. The value of Δ is set to 0.5 in this study [37].

After computing the GIP similarity and sematic similarity of diseases, we integrate the final similarity of diseases with their mean as follows:

$$\begin{array}{@{}rcl@{}} S_{d} = \frac{G_{d}+D_{semsim}}{2}, \end{array} $$
(8)

DWNN for new circRNAs and diseases

The good performance of prediction method largely depends on the quality of known circRNA-disease associations. In fact, new circRNAs (or new diseases) have no any association with diseases (or circRNAs). In this study, we use the DMNN to compute the initial association score based on similarities of circRNAs and diseases. Specifically, the initial association score between new circRNA ci and disease dj can be calculated as follows:

$$ y\left(c_{i},d_{j}\right) = \frac{\sum G{^{il}_{c}}y_{lj}}{\sum G{^{il}_{c}}}, c_{l} \in N{(c_{i})} $$
(9)

where N(ci) is the set of \(k_{c_{i}}\) nearest neighbors of new circRNA ci. The parameter \(k_{c_{i}}\) is calculated as follows:

$$\begin{array}{@{}rcl@{}} {k_{c_{i}}} = \left\{ \begin{aligned} max(k)&,if \ \frac{1-simset(c_{i})_{l}}{l}\le \epsilon{^{l}}\,1\le l \le k \\ 0&,otherwise \\ \end{aligned} \right. \end{array} $$
(10)

where simset(ci)l is the l-th similarity value of the ranked vector based on similarity between circRNA ci and other circRNAs from high to low. Furthermore, the parameter ε is used to control the range of εl that is used to select k nearest neighbors for each new circRNA and disease. In this study, the value of ε is set to 1, so the value of εl is 1 and all neighbors are used to calculate initial association score.

Similarly, we also compute the initial association scores of new disease dj and circRNA ci as follows:

$$ y\left(c_{i},d_{j}\right) = \frac{\sum G{^{jl}_{d}}y_{il}}{\sum G{^{jl}_{d}}}, d_{l} \in N{(d_{j})} $$
(11)

where N(dj) is the set of \(k_{d_{j}}\) nearest neighbors of new disease dj. The parameter \(k_{d_{j}}\) is also calculated as follows:

$$\begin{array}{@{}rcl@{}} {k_{d_{j}}} = \left\{ \begin{aligned} max(k)&,if \ \frac{1-simset(d_{j})_{l}}{l}\le \epsilon{^{l}}\,1\le l \le k \\ 0&,otherwise \\ \end{aligned} \right. \end{array} $$
(12)

where simset(dj)l is the l-th similarity value of the ranked vector based on similarity between disease dj and other diseases from high to low. Parameter ε is also used to control the range for selecting neighbors.

Kronecker product kernel based regularized least squares(RLS-Kron)

In this study, we use RLS-Kron method to predict new circRNA-disease associations [38, 39, 43]. Based on the kernel K, the predicted circRNA-disease associations matrix has a simple closed-form solution as follows:

$$ vec\left({\hat Y}^{T}\right) = K{\left(K+\sigma I\right)}^{-1}vec\left(Y^{T}\right) $$
(13)

in which the parameter σ is a regularizations parameter and is set to 0.2 in this study. Kron-RLS has no any prediction ability when σ is set to 0. The kernel K is calculated from the Kronecker product KcKd of the circRNA kernel and disease kernel, which is defined as follows:

$$ K\left(\left(c_{i},d_{j}\right),\left(c_{u},d_{v}\right)\right) = K_{c}(d_{i},d_{u})K_{d}(t_{j},t_{v}) $$
(14)

where matrices Kc and Kd are the similarity matrices of circRNAs and diseases, respectively. In addition, in order to calculate the predicted matrix, Kron-RLS needs to compute the inverse of an NcNd×NcNd matrix. Therefore, we also use an effective method based on matrix eigenvalue decomposition. According to the matrix theory, the eigenvalues (vectors) of a kronecker product are the Kronecker product of eigenvalues (vectors). Specifically, the kernal can be calculated as follows:

$$ K = K_{c} \otimes K_{d} = \vee \wedge {\vee}^{T} $$
(15)

where ∧=∧c⊗∧d and ∨=∨c⊗∨d are all derived from the eigenvalues decompositions of the two kernel matrices Kc and Kd. As Kc and Kd are real symmetric matrices, their specific eigenvalues decompositions process are defined as follows:

$$ K_{c}={\vee}_{c}{\wedge}_{c}{\vee}{_{c}^{T}} $$
(16)
$$ K_{d}={\vee}_{d}{\wedge}_{d}{\vee}{_{d}^{T}} $$
(17)

where ∨c and ∨d are orthogonal matrices whose columns are the eigenvectors of Kc and Kd, respectively. ∧c and ∧d are diagonal matrices whose diagonal entries are the eigenvalues of Kc and Kd, respectively. Therefore, the final predicted circRNA-disease associations matrix \({\hat Y}\) can be calculated as follows:

$$\begin{array}{@{}rcl@{}} {\hat Y} = {\vee}_{c}{Z^{T}}{\vee}{_{d}^{T}} \end{array} $$
(18)
$$ vec(Z) =({\wedge}_{c} \otimes {\wedge}_{d})({\wedge}_{c} \otimes {\wedge}_{d}+ \sigma I)^{-1}vec\left({\vee}{_{d}^{T}}Y^{T}{\vee}{_{c}}\right) $$
(19)

Results

Performance evaluation

In this study, we conduct 5CV, 10CV and LOOCV to evaluate the performance of DWNN-RLS for predicting new circRNA-disease associations. AUC (area under the ROC curve) value is used as the evaluation metric.

We perform 10 repetitions of 10CV and 5CV. That is, under 10CV, the known circRNA-disease associations data are divided into 10 folds, and each fold takes in turn as the test set and the rest as the train set at each time. Similarly, the data set are randomly divided into 5 folds and each fold takes in turn as the test data and the rest as the train set on each time. In LOOCV, each known circRNA-disease association is in turn chosen as the test set while the rest known circRNA -disease associations as the train set. The larger AUC values show the better prediction ability of the method, while if AUC value is less than or equal to 0.5, the prediction method has no prediction ability.

Comparison with other methods

As there is no competing computational method for predicting circRNA-disease associations in the literature, to assess the performance of our method, we also compare DWNN-RLS against other six effective methods in other relevant prediction issues. These methods include RLS-avg [38], RLS-Kron [38], NetLapRLS [44], KATZ [45, 46], NBI [47] and WP [47, 48]. We briefly review them here. RLS-avg use the average of the output values which are computed from two kernels, respectively. RLS-Kron compute the prediction scores by Kronecker product kernel based on regularised least squares approach. NetLapRLS is used to predict circRNA-disease associations by exploiting information on similarities of links and nodes. KATZ is a network-based method which considers the number of walks between network nodes and lengths in a heterogeneous network to predict associations. NBI is also a network-based method to infer new associations, which only uses cricRNA-disease bipartite network topology similarity. WP and DBSI are recommendation models which directly use the similarities of circRNAs and diseases.

Figure 1 shows the AUC curves of seven prediction methods on CircR2Disease data set in terms of 5CV. The AUC value of DWWN-RLS is the highest among the seven methods, indicating that the prediction performance of DWWN-RLS is better than other methods.

Fig. 1
figure 1

The AUC curves of seven methods in the 5CV

Figure 2 shows the AUC curves of seven prediction methods in terms of 10CV on CircR2Disease dataset. The AUC value of DWWN-RLS reaches 0.9205, which is better than other methods (RLS-avg: 0.7477, RLS-Kron: 0.8103, NetLapRLS: 0.6744, KATZ: 0.8343, NBI: 0.6648, WP: 0.6198).

Fig. 2
figure 2

The AUC curves of seven methods in the 10CV

Figure 3 shows the prediction comparison result between DWWN-RLS and other six methods in terms of LOOCV on CircR2Disease data set. We can see from the Fig.3 that the prediction performance of DWWN-RLS (0.9701) is superior to other methods in terms of AUC values (RLS-avg: 0.9169, RLS-Kron: 0.9088, NetLapRLS: 0.6905, KATZ: 0.8432, NBI: 0.699, WP: 0.6362).

Fig. 3
figure 3

The AUC curves of seven methods in the LOOCV

Note that the advantage of prediction performance is more obvious in 10CV and LOOCV than 5CV, indicating that DWWN-RLS can achieve good result based on more known circRNA-disease associations. In addition, the sematic similarity of diseases can improve the prediction performance of DWWN-RLS. When only the GIP similarity is used, the AUCs of DWNN-RLS are 0.8368, 0.8819 and 0.9423 in 5CV, 10CV and LOOCV, respectively. When the GIP similarity combined with the disease sematic similarity, DWNN-RLS obtains the increased AUCs of 0.8854, 0.9205 and 0.9701 in 5CV, 10CV and LOOCV. By comparing with RLS-Kron method, the DMNN method also can improve the prediction performance. Comparing with KATZ, NBI and WP methods, we think that DWNN-RLS is a machining learning model and has the objective function and solution process that is beneficial to obtain better prediction performance.

Parameter analysis for ε and σ

To further understand the robustness of DWWN-RLS method, we analyze the influence of parameters ε and σ on the prediction performance in 10CV. The parameter ε is used to control the range for selecting k nearest neighbors of cicrRNAs and diseases. The parameter σ is the regularization parameter of DWWN-RLS method. The value of ε is set to be 1.0 when analyzing parameter σ. Furthermore, we also set the default value of σ to be 0.2 when analyzing parameter ε. With parameter σ of 0.2, Table 1 demonstrates the prediction performance of DWWN-RLS method in 10CV when ε ranges from 0.1 to 1.0 with 0.1 increments. The prediction performance of DWWN-RLS method is best when ε is set to be 1.0, indicating that all neighbors of circRNAs and diseases are involved in calculating their initial associations scores.

Table 1 The 10CV prediction performance of various parameter values of ε ranging from 0.1 to 1.0 with 0.1 increments, the best result is in bold face

Furthermore, Table 2 describes the prediction performances of DWWN-RLS with different values of σ when ε is set to be 1.0. We can see from Table 2 that DWWN-RLS obtains the best prediction performance when σ is set to be 0.2. Therefore, in this study, we set the default value of σ to be 0.2.

Table 2 The 10CV prediction performance of various parameter values of σ ranging from 0.1 to 1.0 with 0.1 increments, the best result is in bold face

Case studies

After confirming the prediction performance and robustness of DWWN-RLS method in 10CV, 5CV and LOOCV, we further analyze the prediction ability of DWWN-RLS in discovering new circRNA-disease associations. In predicting new circRNA-disease associations, all known circRNA-disease associations on CircR2Disease dataset are chosen as the train set and all other circRNA-disease pairs are the candidate circRNA-disease associations. We adapt DWWN-RLS to compute the prediction scores for these candidate circRNA-disease pairs. Here, we analyze the prediction results of Atherosclerotic vascular disease and Breast cancer.

Atherosclerotic vascular disease is responsible for the majority of cases of CVD (Cardiovascular disease) in both developing and developed countries, which encompasses coronary heart disease, cerebrovascular disease, and peripheral arterial disease, and which also result the CVD, the leading cause of death and disability all over the world [49, 50]. Table 3 shows that 2 of top 10 predicted associations are confirmed in the previous literature. Elevated cANRIL expression could lead to worse EC (endothelial cells) inflammation, exacerbating AS (atherosclerosis) [51]. CANRIL is transcribed at a locus of atherosclerotic cardiovascular disease on chromosome 9p21, and induces nucleolar stress and apoptosis, and inhibits the proliferation in smooth muscle cells and macrophages [52]. The cZNF292 also associates with atherosclerotic cardiovascular disease by stimulating angiogenesis through vascular sprouting and cell proliferation [53].

Table 3 The validation results of predicted top 10 new circRNA-disease associations of Atherosclerotic vascular disease

There is approximately 1 in 12 women developing breast cancer in Western Europe and the United States, and which is characterized by a distinct metastatic pattern involving the regional lymph nodes, bone marrow, lung and liver [54, 55]. Table 4 shows the validation results of top 10 new circRNA-disease associations predicted by DWNN-RLS. There is 3 out of top 10 predicted associations that can be validated in previous studies. CircRNAs circGFRA1 and GFRA1 act as ceRNAs in triple negative breast cancer by regulating miR-34a [56]. The human breast cancer cell line MDA-MB-231 are stably transfected with circ-Foxo3, the ectopic expression of the Foxo3 circular RNA could suppress tumor growth, cancer cell proliferation and survival [25]. CDR1as contains more than 70 selectively conserved target sites of miR-7 which can directly downregulate oncogenes in cancers such as breast cancer [57].

Table 4 The validation results of predicted top 10 new circRNA-disease associations of Breast cancer

Above case studies demonstrate that there are a number of prediction results that have not been confirmed by previous literature. To our knowledge, a possible reason is that the database Circ2Disease are still limited and the new studies have not been published yet. In summary, these predicted circRNA-disease associations deserve being studied and considered in the future.

Discussion

With the advances of RNA-Seq, high-throughput sequencing and other techniques, we have achieved some important progresses in understanding characteristics and functions of cricRNAs. CricRNAs may play key roles in diseases as miRNA sponges or decoys, protein sponges or decoys and regulation gene transcription. Therefore, systematically understanding association between circRNAs and diseases has become an important issue of bioinformatics research, which is beneficial to disease diagnose and treatment. Although some databases about circRNA have been established in recent years, these databases rarely focused on the associations between circRNAs and diseases. The computation methods for predicting circRNA-disease associations are also lacking because of these limitations. To our knowledge, CircR2Disease is the first database about circRNA-disease associations, which provides the chance to develop effective methods to identify novel associations between circRNAs and diseases.

Conclusion

DWNN-RLS method is developed to predict new associations between circRNAs and diseases on CircR2Disease dataset. Firstly, DWNN-RLS computes the GIP similarities of circRNAs and diseases based on the known circRNA-disease associations. Secondly, we further compute the sematic similarity of disease and compute the final similarity of diseases with the mean of GIP similarity and sematic similarity. Finally, the Kron-RLS is used to predict novel circRNA-disease associations based on their similarities. 10CV, 5CV and LOOCV are used to evaluate the prediction performance of DWNN-RLS. In addition, we use the DWNN to calculate the initial associations scores for new circRNAs and diseases. We also compare our method with other six methods. In terms of 10CV, 5CV and LOOCV, DWNN-RLS all achieves the best prediction performance. In addition, we also show that DWNN-RLS method may achieves better prediction performance with the more known circRNA-disease associations. Case studies further illustrate the prediction performance of DWNN-RLS.

However, there still exist some limitations in DWNN-RLS. We all know that cricRNAs can function as miRNA sponges or decoys, protein sponges or decoys. In this study, we only use the GIP similarity of circRNAs. In the future, the similarity computation of circRNAs could consider more relevant biological network information, such as cricRNA-miRNA associations and sequence information. Similarly, the disease functional information also should be considered [5860]. Other latest matrix factorization methods such as NRLMF [61], SRMF [62], DRRS [63] should be considered to predict cricRNA-disease association when we integrate more biological network information such as circRNA-miRNA associations, circRNA sequence information and disease functional information. Therefore, to further improve the prediction performance, we would develop a more effective approach to discover new circRNA-disease associations by reasonably integrating more biological network information.