Background

With the advent of high-throughput technologies (such as microarrays), an explosive growth of gene-expression data in toxicology applications has given us new insights into the biological processes of hepatotoxicity or nephrotoxicity [1]. Thus, many studies have worked on using gene-expression profiles for the prediction of the potential negative effects of new drugs under development [1, 2]. Because new drugs often lead to some injuries in the liver or kidney, which present various pathological findings, it would be very useful to predict the drug-induced toxicity for multiple organs. However, thus far, most of the earlier works focused on predicting single-organ toxicity with only limited pathological symptoms [39].

For example, Cui et al. [3] used gene-expression profiles obtained from kidney RNA samples with rat cDNA microarrays to develop an SVM-based prediction model that groups gene-expression profiles into four classes according to the severity and type of pathology. Using this model, they could predict the pathologies of 28 test profiles with 100% specificity and 82% sensitivity. Lingkang et al. [4] grouped liver samples by the level of necrosis exhibited in the tissue and used them to develop a random forest classifier with 21 genes. They achieved 90%, 80%, and 60% prediction accuracies against test data derived from the livers of rats exposed to three different hepatotoxins, respectively. Kedar et al. [5] proposed a network model to assess chronic hepatotoxicity based on sub-chronic hepatic gene-expression data in rats, where the directed graph accounts for the interactions among drugs, differentially expressed genes and chronic hepatotoxicity. They estimated phenotypical exposure risk such as toxic hepatopathy, diffuse fatty change and hepatocellular adenoma from gene-expression profile of rats. Low et al. [6] proposed a hybrid model of chemical descriptors and gene-expression data to predict hepatotoxicity. Although the hybrid models did not show better accuracy than the model based on only gene-expression data, the use of chemical descriptors could enrich the interpretation of the model. Minowa et al. [7] developed the prediction model for the onset of drug-induced proximal tubular injury from gene-expression data, and achieved 93% sensitivity and 90% specificity. Bowles et al. [8] quantified several pathological findings and DILI severity and then constructed statistical prediction models from gene-expression profiles for liver pathology in rats and for drug-induced liver injury. For this purpose, they used the Lasso regression and "glmnet" algorithm to extract models for rat liver pathology prediction and used stochastic gradient boosting to extract models for drug-induced liver injury. Zhang et al. [9] employed gene-expression data, cell-based assays, and pathological data to obtain a network of early-response genes as a consensus signature of drug-induced in-vitro and in-vivo toxicity, and used the network to predict liver or kidney toxicity from gene-expression data. The accuracy of prediction model was between 80% and 97% in both liver and kidney.

In this work, we attempt to develop an integrative model of drug-induced toxicity prediction applicable to the prediction of multiple organ pathologies. To achieve this end, we investigated the relationships between distinct pathologies by exploring co-occurrences of pathologies within training samples, and used them to combine individual pathology prediction models for drug-induced toxicity. Consequently, for a test sample, our integrative model can predict the occurrence of the trained pathologies presented in multiple organs.

Methods

Target pathological findings in multi-organs

For experiments, our toxicity prediction model targets 11 liver physiological changes (pathological findings) that signal the presence of developing pathology (Eosinophilic change, Cellular infiltration, Hypertrophy, Degeneration, Pigment deposit, Increased mitosis, Necrosis, Single cell necrosis, Cytoplasmic vacuolization, Proliferation, Fibrosis) and 10 kidney pathological findings (Cellular infiltration, Hypertrophy, Degeneration, Necrosis, Hyaline droplet, Hyaline cast, Dilatation, Basophilic change, Cytoplasmic vacuolization, Regeneration), carefully chosen from Open TG-GATEs http://toxico.nibio.go.jp[10].

In fact, according to TG-GATEs, the researchers extracted the slides of the liver or kidney from all in-vivo samples, and pathologists examined these slides to detect pathological symptoms and their severity in various parts of the liver or kidney. (See Additional file 1, 2) Due to the limitation of available gene-expression data in both the liver and kidney, which cannot cover all possible pathological symptoms, we targeted only the 21 findings, as mentioned above.

For 37 liver and 26 kidney drug-induced pathological findings observed in in-vivo rat samples exposed to one of 131 drugs in TG-GATEs, we simplified them into general terms (e.g., grouping the subtypes of liver degeneration into 'liver degeneration', or grouping the necrosis of bile duct or hepatocyte into liver necrosis.) and then selected only the pathology terms that were induced by at least 5 drugs or more, as the targets of our prediction model.

Training data for toxicity prediction model

For the learning of the toxicity prediction model, we used confirmed case and control samples, with gene-expression profiles for cases with targeted pathological findings and expression profiles for controls without the targeted pathological finding, respectively. All the gene-expression profiles in the training data were normalized by using a robust multi-array average (RMA) method with the RefPlus R package [11], where the reference model was built by selecting 5% of the training samples in each organ and used to derive reference quartiles and probe effects for normalization. To construct the toxicity prediction model, we need to identify gene signatures particular to each of the 21 pathological findings. For this purpose, we applied the t-test to the training data of confirmed case/control samples in each pathological finding and selected the top 5% ranked genes in the increasing order of p-values.

Construction of individual pathology prediction models

For each of the 21 pathological findings, we trained an ensemble model of n k-nearest neighbor (KNN) classifiers (i.e., an ensemble of n sub-prediction models) to construct an individual pathology prediction model. The main reason of developing n sub-prediction models for each pathology prediction is to handle the data imbalance problem (i.e., handling the situation in which the number of case samples is much smaller than the number of control samples), a problem that commonly occurs in training data for adverse drugs reaction [12, 13]. Each sub-prediction model was developed with each of n jackknife samples generated from training data, in which each jackknife sample includes all the confirmed case samples and a randomly selected subset of control samples. The final output of any individual pathology prediction model is determined by aggregating all the outputs of n sub-prediction models for each pathology finding. That is, for a test sample s of gene-expression profile with exposure to a certain drug, our pathology prediction model produces the pathology prediction score (PPS) M F i ( s ) as an output for a pathology finding F i by averaging all the outputs of n sub-prediction models as follows:

M F i ( s ) = 1 n k = 1 n s u b _ M F i k ( s )

The score of M F (s) is in the range of 0[1], where values approaching 0 indicate that a pathological finding F will not occur in the sample s and values closer to 1 indicate that F will occur in the sample s. If M F (s) is higher than a certain threshold, we predict that the pathological finding F would occur in the sample s. Figure 1 illustrates the procedure of producing the PPS as an output of individual pathology prediction.

Figure 1
figure 1

The procedure of producing the pathology prediction score (PPS) in an individual pathology prediction model.

Pathology relationships extraction

To investigate the relationships between pathological findings for drug-induced toxicity prediction, we constructed a pathology occurrence matrix that describes the presentation states of training samples for the targeted pathology findings (i.e., 11 liver and 10 kidney). From this matrix, we can derive the pathology occurrence vector for each pathology finding and use them to calculate the Jaccard similarities between confirmed case samples displaying two different pathological symptoms. Specifically, assuming that there are n training samples and k pathological findings of interest, the pathology occurrence matrix Φ is defined by:

Φ = Φ 1 Φ k = φ 11 φ 1 k φ n 1 φ n k

Where φ ij = if a pathology F j occurred in the ith training sample and φ ij otherwise. Also, the similarity between F i and F j is defined as the Jaccard coefficient between their pathology occurrence vectors (i.e., Φ i and Φ j ) which measures the commonness of sample occurrences of two pathology findings. Formally, the pathology similarity S between F i and F j is calculated by:

S ( F i , F j ) = J a c c a r d ( Φ i , Φ j ) = N 11 N 01 + N 10 + N 11

where N01 is the total number of training samples in which F i does not occur and F j does occur, N10 is the total number of training samples in which F i occur and F j does not occur, and N11 is the total number of training samples in which both F i and F j does occur. The value of pathology similarity is in the range of 0[1], and some examples of pathology similarities for 4 pathological findings (i.e., liver eosinophilic change, liver cellular infiltration, kidney cytoplasmic vacuolization, and kidney regeneration) are shown in Table 1.

Table 1 Examples of the pathology similarities for 4 pathological findings.

The rationale of pathology relationships can be explained as follows: A drug-induced toxicity can present several pathological findings in liver or kidney, and the occurrence of certain pathological finding may be associated with the occurrence of some other pathological finding(s). This is because a drug-induced toxicity can perturb certain gene-expressions or protein activities followed by the perturbation of several pathway activities, leading to various toxicity symptoms [14]. For example, the PPAR-alpha gene is related to liver hypertrophy [15], which is also related to kidney tubular necrosis [16, 17]. ARG1 gene is related to both biliary injury and single cell necrosis [18]. Hence, some symptom(s) of toxicity might be the indirect evidence of other symptom(s) of toxicity. To implement this rationale in our experiments, we estimated the degree of the association between two pathological findings by calculating the Jaccard similarity between their pathology occurrence vectors.

The integrative model for drug-induced toxicity prediction

The construction of the integrative model for drug-induced toxicity prediction is done by the weighted combination of all the 21 individual pathology prediction models. For this purpose, we first obtain the normalized pathology similarities N(F i , F j ) from the pairwise pathology relationships S(F i , F j ) defined as above by exploring co-occurrence samples of two pathological findings. That is, when having k different pathological findings, the normalized pathology similarity between F i and F j is defined by

N ( F i , F j ) = S ( F i , F j ) m = 1 k S ( F i , F m ) 2

Then, our integrative model produces the IP S F i ( s ) integrative prediction score of a test sample s for each pathological finding F i , which is calculated by

IP S F i ( s ) = m = 1 k N ( F i , F m ) M F m ( s )

If the score of IP S F i ( s ) is higher than a certain threshold, it is predicted that the pathology finding F i would occur in the test sample s. Thus, for k distinct pathological findings, our integrative model produces a k-dimensional vector of the IPSs of a test sample s as follows:

I P S ( s ) = IP S F 1 ( s ) IP S F 2 ( s ) IP S F 3 ( s ) IP S F k ( s ) = N ( F 1 , F 1 ) N ( F 1 , F 2 ) N ( F 1 , F 3 ) N ( F 1 , F k ) N ( F 2 , F 1 ) N ( F 2 , F 2 ) N ( F 2 , F 3 ) N ( F 2 , F k ) N ( F 3 , F 1 ) N ( F 3 , F 2 ) N ( F 3 , F 3 ) N ( F 3 , F k ) N ( F k , F 1 ) N ( F k , F 2 ) N ( F k , F 3 ) N ( F k , F k ) M F 1 ( s ) M F 2 ( s ) M F 3 ( s ) M F k ( s )

An illustrative example of the integrative model

For better understanding, we illustrate the integrative model by a toy example (Figure 2). Suppose that we have 4 different pathological findings of interest. Firstly, we develop 4 individual pathology prediction models, producing M F i ( s ) , i = 1, ..., 4, as in Figure 2(a). Secondly, the relationships among the 4 pathological findings are extracted by exploring co-occurrence cases of two different findings in training data, as in Figure 2(b). Thus, if we assume that M F 1 ( s 1 ) =0.4, M F 2 ( s 1 ) =0.4, M F 3 ( s 1 ) =0.5, and M F 4 ( s 1 ) =0.7 for a sample s 1 and M F 1 ( s 2 ) =0.5, M F 2 ( s 2 ) =0.0, M F 3 ( s 2 ) =0.6, and M F 4 ( s 2 ) =0.8 for a sample s 2 , our integrative model would produce the scores of IP S F 1 ( s 1 ) =0.521 and IP S F 1 ( s 2 ) =0.479, as shown in Figure 2(c). From this example, we can notice that the IPS for a certain pathological finding is affected by both the degree of the association with other pathology findings and the results of other pathology prediction models. That is, in the case of a pathological finding F 1 of s 1 , the score of IP S F 1 ( s 1 ) =0.521 gets higher than the score M F 1 ( s 1 ) =0.4 of an individual pathology prediction model because its highly associated finding F2 produces a relatively high score of M F 2 ( s 1 ) =0.4. On the other hand, in the sample of s 2 , the IP S F 1 ( s 2 ) =0.479 gets lower than the score M F 1 ( s 2 ) =0.5 of an individual pathology prediction model because its highly associated finding F 2 produces a low score of M F 2 ( s 2 ) =0.0. Thus, if the threshold of IPS is set to 0.5, our model predicts that the pathological finding F 1 will be presented in the sample s 1 and not in the sample s 2 .

Figure 2
figure 2

An illustrative example of integrative model.

Results and discussion

Evaluation of individual pathology prediction models for multi-organ drug-induced toxicity

To evaluate the performance of individual pathology prediction models in forecasting multi-organ drug-induced toxicity over the targeted pathological findings, we used 3708 rat in-vivo samples exposed to one of our chosen 41 drugs. (See Additional file 3) Here, 41 drugs were chosen from TG-GATEs of which the gene expression profiles are provided in both the liver and kidney samples in all cases. As discussed above, we developed 21 individual pathology prediction models for our targeted pathological findings (i.e., 11 in liver and 10 in kidney) using the KNN-based ensemble classifiers, which produced the outputs of the pathology prediction scores (PPSs). If the PPS output of a pathology prediction model for a test sample is greater than a given threshold, then we predict that the targeted pathological finding will occur in the test sample.

We estimated the prediction performance of each individual pathology prediction model in terms of sensitivity and specificity using the k-fold cross-validation method. That is, original training data of the in-vivo samples mentioned in Methods were divided into k groups of equal size in the unit of drugs. To avoid an over-fitting problem, we used the samples of k- 1 groups for model development (including gene signature selection for the targeted pathology, pathology relationship extraction, and prediction model learning), and used the samples of 1 remaining group to test model performance. This process was iterated k times for different selection of a test group (See Additional file 4). Here, the threshold value of PPS in each pathology prediction model was set to maximize the geometric mean of sensitivity and specificity. The geometric mean of sensitivity and specificity is often used to evaluate a prediction model with imbalanced data set.

Figure 3 displays the evaluation results of the 21 individual pathology prediction models. As seen in this figure, all the individual pathology prediction models tend to show relatively lower sensitivities (approximately 70% on average) than specificities (approximately 87% on average) overall in predicting the targeted pathological findings. In addition, the standard deviations of sensitivity over all the models seems to be considerably high, ±18% in liver pathology and ±17% in kidney pathology, compared to those of specificity, i.e., ±8% in liver pathology and ±9% in kidney pathology. Even the sensitivities of several pathology prediction models (e.g., models for liver single cell necrosis and kidney hypertrophy) are extremely low as <50%. Since the performance of individual pathology prediction models depends on the choice of the PPS thresholds, it is worthwhile to draw ROC curves for model evaluation. The ROC curves of 6 pathology prediction models, which show the most distinction between PPS-based models and IPS-based models, are given in Figure 4. All the ROC curves of 21 pathology prediction models are also given in Additional file 5.

Figure 3
figure 3

Evaluation of the individual prediction models in forecasting 11 liver and 10 kidney pathological findings.

Figure 4
figure 4

ROC curves of the pathology prediction models regarding 3 liver and 3 kidney pathological findings.

Evaluation of the proposed integrative models for multi-organ drug-induced toxicity

We evaluated the prediction performance of the proposed integrative models in the same manner as for the individual pathology prediction models, using the k-fold cross-validation method. Particularly, we extracted pathology relationships only based on the samples of k-1 groups chosen as training set. (See Additional file 4) Here, the threshold of IPS was also set to maximize the geometric mean of sensitivity and specificity in forecasting the targeted pathological changes.

Figure 5 displays the evaluation results of our integrative models for the prediction of 21 pathological findings. We observe that overall prediction performance is quite impressive, increasing sensitivity considerably (approximately 84% on average) without much sacrifice of specificity (approximately 83% on average). In addition, the variability of the prediction performance for all pathologies in the integrative model is much lesser than the individual pathology models it is comprised of, indicating that an integrative approach can provide more stable performance. Figure 4 shows the ROC curves of the integrative models for the 6 pathological findings (3 in liver and 3 in kidney). It is noted that the ROC curves of the IPS-based integrative models are located completely above ROC curves of the PPS-based individual models in several cases. We also measured the area under the ROC curves (AUC) of the IPS-based integrative models and the PPS-based individual models (see Figure 6). As seen in Figure 6, the IPS-based integrative models showed much better performance than the PPS-based individual models overall. On average, approximately 20% of AUC in the proposed integrative models increased over all the pathology prediction, compared to individual pathology models. Particularly, for some pathological findings like liver single cell necrosis and kidney cytoplasmic vacuolization, the IPS-based integrative models achieved distinct improvement in prediction performance. For example, regarding liver single cell necrosis, the IPS-based integrative model can make high quality predictions within 93% of its AUC, but the PPS-based individual models only display this performance within 26% of AUC.

Figure 5
figure 5

Evaluation of the proposed integrative models in forecasting 11 liver and 10 kidney pathological findings.

Figure 6
figure 6

Comparison of prediction performance in the IPS-based integrative models with the PPS-based individual pathology models.

Furthermore, we compared our integrative model performance with the prediction performance given by the recent work of Bowles et al. (2013) [8] which also employs TG-GATEs to train and evaluate the prediction model. Table 2 shows the AUC comparison of the both models in predicting liver toxicity only, since the work of Bowles et al. only concerned itself with liver toxicity. Even if it shows limited comparison in predicting only three liver pathological findings (liver necrosis, hypertrophy, cellular infiltration), our integrative model performance seems to be comparable to their work.

Table 2 AUC comparison of the proposed integrative model with the work of Bowles et al.

Conclusions

We introduced an integrative model approach for multi-organ drug-induced toxicity prediction. For the development of integrative models, we investigated pathology relationships by considering the co-occurrence of targeted pathological states in training samples, and used them to unite individual pathology prediction models. Consequently, our integrative models produced better prediction performance over all of the 21 targeted pathological findings, than individual pathology prediction models.

Even when the performance of individual pathology prediction models is extremely low, our integrative models could improve the prediction performance considerably by referring to the prediction results of other highly associated pathological findings. In this work, our prediction models were developed using limited training set, so it is expected that more training samples would make the better estimation of the real associations among toxicity pathological findings.