DRDA, a drug repositioning algorithm that is based on a deep autoencoder and adaptive fusion, is proposed in this paper to address the problems of data sparseness and low-efficiency fusion of multisource data in drug repositioning. First, dimension reduction was performed on drug features, including drug chemical structures and drug target proteins, using a deep autoencoder before extracting more abstract representations of drug features. Then, drug similarity was computed using drug features and drug side-effect data, and disease similarity was computed using drug-disease associated data. The prediction of drug-disease association was computed using the top-k similar neighbor method, which is more suitable for drug repositioning. Finally, the prediction of the drug-disease association was determined with the fusion of predictions computed by various data sources utilizing adaptive fusion. The algorithm framework is shown in Fig. 1.
Dimension reduction in drug features based on the deep autoencoder
To reduce the sparseness of drug chemical structure and drug target protein data, two types of data were integrated into drug feature data. Then, more abstract drug features were extracted to decrease data sparseness by reducing the dimensions of drug features with a deep autoencoder. The deep autoencoder is an extension of the autoencoder, which converts high-dimensional data to low-dimensional data via a multilayer encoding network and recovers the encoding with a similar decoder. A training network for errors between input data can be reconstructed by minimizing the original data [11]. Thus, a deep autoencoder that is more relevant to drug feature data was designed. Its structure is shown in Fig. 2.
The deep autoencoder is composed of an encoder and a decoder. "Adagrad" [12] is regarded as an optimization method, and the encoder consists of an input layer and three building layers [13]. Specifically, the building layer is composed of a fully connected layer and a discarded layer, and the last building layer is a coding layer. All layers contain a set of fully connected layers and a discard layer with a parameter set to 0.5. In addition, ReLU is used as the activation function. When dimension reduction is performed on drug feature data with drug feature data \(x \in R^{m \times n}\) as input, the building layer is computed in Eq. (1):
$$g^{i} (x) = f(\omega^{i} g^{*} (x) + b^{i} )$$
(1)
where \(g^{i} (x)\) is the output of the building layer; \(f( \cdot )\) is a nonlinear activation function; \(\omega^{i} ,b^{i}\) represents the weight and offset of the ith building layer, \(g*(x) = \left\{ {\begin{array}{*{20}c} x & {i = 1} \\ {g^{i - 1} (x)} & {i > 1} \\ \end{array} } \right.\); and i is the number of building layers.
The decoder consists of three building layers and an output layer; the first building layer is an encoding layer. ReLu is used as the activation function in all layers except the output layer, which uses the sigmoid function as its activation function. Because drug feature data are binary, the reconstructed output should be data approaching 0 or 1 to reduce errors between the reconstructed output and input as quickly as possible. Most of the output values of the sigmoid function are concentrated at approximately 0 and 1 (Fig. 3), which follows the structure required by the output. The computation method of the decoder building layer is similar to that of the encoder building layer. The decoder output layer can be computed in Eq. (2):
$$\mathop x\limits^{\_} = g^{4} (x) = f(\omega^{3} g^{3} (x) + b^{3} )$$
(2)
The mean square error is applied in the deep autoencoder as the loss function. The gap between the input x and the output \(\mathop x\limits^{\_}\) upon reconstructed input can be decreased by minimizing the mean square error. With the minimum error, extracting the features of the encoding layer is the feature data after dimension reduction.
The training parameter batch size used is 16, and the learning rate of the optimization function "Adagrad" is 0.01. Before training, parameters are initialized via Xavier [14]. In addition, 400-dimension feature data extracted from the output of the encoding layer are used for subsequent computation of drug similarity.
Similarity computation
Drug similarity
Dudley [15] and Li [16] et al. stated that drug chemical structure and drug target proteins play a critical role in calculating drug similarity because they are quantitatively related. Drugs with similar target proteins can also treat similar diseases. Drug chemical structure data and drug target protein data were sampled from PubChem [17] and the UniPort Knowledgebase [18]. After dimension reduction of drug feature data with the aforementioned deep autoencoder, the drug features that are denser than the original data can be obtained. Then, cosine similarity is used to calculate drug similarity, as shown in Eq. (3):
$$sim(d,d*) = \frac{{\sum\limits_{i = 1}^{n} {{\text{f}}_{{d_{i} }} \cdot {\text{f}}_{{d_{i}^{*} }} } }}{{\sqrt {\sum\limits_{i = 1}^{n} {{\text{f}}_{{d_{i} }}^{2} } } \cdot \sqrt {\sum\limits_{i = 1}^{n} {{\text{f}}_{{d_{i}^{*} }}^{2} } } }}$$
(3)
where \(sim(d,d*)\) represents the similarity of the drug \(d\) and the drug \(d*\); \({\text{f}}_{{d_{i} }}\) and \({\text{f}}_{{d_{i}^{*} }}\) stand for the value of the ith drug feature in the drug \(d\) and the drug \(d*\), respectively; and n is feature dimension.
Verifying whether two drugs can act on the same target through drug side effect data was proposed in the literature [19]. Additionally, a series of experiments were designed to demonstrate the feasibility of inferring molecular interactions using side effect data. Thus, the drug side effect data can be used to calculate drug similarity. Drug side effect data were sampled from the SIDER [20] database. If the drug causes a specific type of side effect, the value is set to 1; if not, it is set to 0. The Tanimoto coefficient is used for computation in Eq. (4):
$$sim(d,d*) = \frac{{{|}I_{dd*} {|}}}{{|I_{d} | + |I_{{d^{*} }} | - |I_{{dd^{*} }} |}}$$
(4)
where \(I_{dd*}\) represents the number of the same side effects of the two drugs, and \(I_{d}\) and \(I_{{d^{*} }}\) represent the number of side effects of the two drugs, respectively.
Disease similarity
In the literature [21], an inferred idea associated with drug repositioning has been proposed: two diseases are deemed similar when they can be treated by a variety of identical drugs. According to the method proposed in the literature [7], disease similarity can be computed with drug-disease associated data sampled from UMLS [22]. The data are binary: if the drug has a treatment effect on the disease, it is set to 1; otherwise, it is set to 0. When the Tanimoto coefficient is used for computation, Eq. (5) can be used:
$$sim(e,e*) = \frac{{{|}I_{ee*} {|}}}{{|I_{e} | + |I_{{e^{*} }} | - |I_{{ee^{*} }} |}}$$
(5)
where \(sim(e,e*)\) is the similarity of the two diseases; \(I_{ee*}\) is the number of drugs that can treat the two diseases; and \(I_{e}\) and \(I_{{e^{*} }}\) represent the number of drugs that can treat the diseases e and e*, respectively.
Computation of prediction
Only "0" and "1" relationships between the drug and disease can be found in the original drug-disease associated data, while certain side-effect relationships may be detected between the drug and some diseases. To calculate the drug-disease associated prediction effectively, the known drug-side effect relationship in the drug-side effect data was marked in the drug-disease data: if a side effect (disease) exists in both the drug-disease associated data and drug-side effect data, the corresponding drug-side effect (disease) should be changed from "0" to "−1" in the drug-disease associated data when the drug produces the side effect.
As shown in the literature [7, 23], the prediction of drug-disease association in drug repositioning can be computed using collaborative filtering. However, the drug-disease prediction of the conventional approaches of collaborative filtering and top-k neighbor cannot be accurately computed because data used in drug repositioning are typically sparse, and there is a side effect relationship between drugs and diseases. In this paper, the drugs or diseases to be evaluated were also computed as similar neighbors based on top-k proximity. With a small number of effective neighbors (i.e., the similarity is not 0) caused by data sparseness, the drug-disease associated information of the drug or disease is decisive, which can avoid false predictions resulting from a lack of effective neighbors. When there are sufficient effective neighbors, the drug-disease associated information of the drug or disease is only one of the neighbors, exerting a small effect on predicting the new effect of the drug. Fusing known information with the prediction can lower the prediction error caused by a lack of effective neighbors due to data sparseness. Applying known information about drugs or diseases for computation can avoid computational collapse resulting from an effective neighbor being zero.
Drug-disease associated prediction can be computed through drug similarity, as shown in Eq. (6):
$$P_{de}^{k} = \frac{{\sum\limits_{{d^{*} \in {\text{NN}}^{\prime}}} {sim^{k} (d,d^{*} ) \times s_{{d^{*} e}} } }}{{\sum\limits_{{d^{*} \in {\text{NN}}^{\prime}}} {sim^{k} (d,d^{*} )} }}$$
(6)
where \(P_{de}^{k}\) is the predicted score between drug d and disease e computed based on data source k; \({\text{NN}}^{\prime}\) is the union set of drug \(d\) and its top-k neighbors; and \(s_{{d^{*} e}}\) is the relational value between the drug \(d^{*}\) and the disease e upon integrating drug-side effect data in the drug-disease associated dataset.
The computation of the drug-disease associated prediction for disease similarity is similar to that of drug similarity, as shown in Eq. (7):
$$P_{de}^{k} = \frac{{\sum\limits_{{e^{*} \in {\text{NN}}^{\prime}}} {sim^{k} (e,e^{*} ) \times s_{de*} } }}{{\sum\limits_{{e^{*} \in {\text{NN}}^{\prime}}} {sim^{k} (e,e^{*} )} }}$$
(7)
where \({{\text{NN}}^{\prime}}\) is the union set of disease e and its top-k neighbors.
The drug-disease associated prediction that finally fuses multiple data sources is calculated as Eq. (8):
$$P_{de}^{*} = \sum\limits_{k = 1}^{{\text{K}}} {\beta_{k} \times P_{de}^{k} }$$
(8)
where \(P_{de}^{*}\) is the prediction of the drug d after data fusion to the disease e; \(\beta_{k}\) is the weight of the k data source; and \(P_{de}^{k}\) is the prediction of the drug d to the disease e in the k data source.
Weight computation
A weight computation method that is more suitable for drug repositioning was designed per the literature [7] to fuse the predictions computed by multiple data sources. In addition, the best prediction effect was achieved by maximizing the combination of the drug-disease associated value of 0 and the difference between its predictions while minimizing the difference between the combination of the drug-disease value of 1 and its prediction. The weight computation method can be expressed as an optimization objective function, such as (9):
$$\begin{aligned} \mathop {\arg \min }\limits_{{\beta_{k} }} L(\beta_{k} ) & = \sum\limits_{k = 1}^{k} {\beta_{k}^{2} } \left( {\sum\nolimits_{{\{ (d,e)|s_{de} = 1\} }} {\left( {s_{{d{\text{e}}}} - p_{de}^{k} } \right)^{2} - } \sum\nolimits_{{\{ (d,e)|s_{de} = 0\} }} {\left( {s_{de} - p_{de}^{k} } \right)}^{2} } \right) \\ {\text{s.t}}.\,\sum\limits_{k = 1}^{K} {\beta_{k} } & = 1 \\ {\text{s.t.}}\,\beta_{k} & > 0 \\ \end{aligned}$$
(9)
where \(\{ (d,e)|s_{de} = 1\}\) represents the combination of the associated value being 1 between drug d and disease e in the drug-disease associated data. Similarly, \(\{ (d,e)|s_{de} = 0\}\) is the drug-disease combination being 0. This formula is intended to increase the weight of data sources that can make the predicted value as large as possible. \(\sum\nolimits_{{\{ (d,e)|s_{de} = 1\} }} {(s_{{d{\text{e}}}} - p_{de}^{k} )^{2} }\) indicates that when the drug d can treat the disease e in the known data, the predicted value and drug must be reduced as much as possible. The disease-related value thus keeps the predicted value near "1". Similarly, \({ - }\sum\nolimits_{{\{ (d,e)|s_{de} = 0\} }} {(s_{de} - p_{de}^{k} )}^{2}\) indicates that when drug d cannot treat disease e in the known data, it is necessary to make the prediction value as large as possible because the repositioning of the drug is determined by the concept that the new use of the drug is predicted. This process is necessary to predict drug-disease combinations with known relationships and to develop a more reasonable and effective unknown drug-disease combination. The optimized problem is solved using the minimization method in scipy of the Python package.
Algorithm flow
The overall algorithm flow is presented as follows: