Investigation of model stacking for drug sensitivity prediction
Abstract
Background
A significant problem in precision medicine is the prediction of drug sensitivity for individual cancer cell lines. Predictive models such as Random Forests have shown promising performance while predicting from individual genomic features such as gene expressions. However, accessibility of various other forms of data types including information on multiple tested drugs necessitates the examination of designing predictive models incorporating the various data types.
Results
We explore the predictive performance of model stacking and the effect of stacking on the predictive bias and squared error. In addition we discuss the analytical underpinnings supporting the advantages of stacking in reducing squared error and inherent bias of random forests in prediction of outliers. The framework is tested on a setup including gene expression, drug target, physical properties and drug response information for a set of drugs and cell lines.
Conclusion
The performance of individual and stacked models are compared. We note that stacking models built on two heterogeneous datasets provide superior performance to stacking different models built on the same dataset. It is also noted that stacking provides a noticeable reduction in the bias of our predictors when the dominant eigenvalue of the principle axis of variation in the residuals is significantly higher than the remaining eigenvalues.
Keywords
Drug sensitivity prediction Stacking BiasAbbreviations
- AUC
Area under the curve
- DL
Deep learning
- KNN
k-Nearest neighbor
- MSE
Mean squared error
- NMSE
Normalized mean squared error
- NN
Neural network
- RF
Random forest
Background
In precision medicine, drug sensitivity prediction is a significant problem. The primary goal of improving prediction accuracy for precision medicine opens up problems that are broadly relevant to other machine learning tasks. In this article, we examine the stacking of predictive models and their influence on prediction accuracy and modeling bias. The principal individual model considered in this article is Random Forests (RF) since previously reported studies [1, 2, 3, 4] have shown RF to outperform multiple other approaches in drug sensitivity prediction applications. However, RF models can suffer from inherent bias where they under predict sensitivities above the mean and over predict sensitivities below the mean. There have been some recent studies to address this bias [5, 6] but none have explored the effect of stacking on bias. In this article, we illustrate that stacking of model predictions automatically lower the inherent bias in RF based models without having to resort to explicit bias reduction approaches. Furthermore, we explored the theoretical underpinnings of the stacking operation on mean squared error and how stacking will produce results that are no worse than the worst individual model.
To demonstrate the role of stacking in accuracy and bias reduction, we created a drug sensitivity prediction setup with multiple data sources. The main motivation behind stacking is that each model will provide complementary information. For that reason we have included a variety of different datasets to built our individual models. We consider multiple cell lines and multiple tested drugs as well as the genomic information in the form of gene expressions for each cell line. Each drug is characterized by its physical properties and its potential targets. The drug responses are normalized Area Under the curve (AUC) values obtained from cell viability curves. This setup allows us to explore incorporating complementary information in our prediction models. For instance, gene expression provides information on each cell line whereas drug targets provide information on each drug that is complementary to genomic information. Thus, the effect of both cell line and drug information can be included in prediction. Note that, we can train some models only on cell lines with fixed drug whereas other models can be trained on drugs with fixed cell line and they can be combined to produce an integrated prediction model. This study provides a theoretically sound, but easy to implement, methodology to jointly analyze multiple pharmacogenomics databases [7, 8] that include information on multiple cell lines and multiple drugs. Thus, for a new cancer patient, a biopsy can be used to generate a genomic profile of the patient and a drug screen can be run to get an estimate of the cell viability for the drugs in the screen and then we can utilize these information along with prior database information to predict sensitivities for drugs that have not been tested in the drug screen. Improvement in performance will motivate us to explore personalized medicine from the perspective of training using both genomic and drug specific features.
Methods
Drug sensitivity prediction
To investigate stacking performance, we selected individual modeling techniques that have previously shown to perform well for drug sensitivity predictions scenarios. These methods include Random Forest regression approach and Neural Network based prediction along with KNN based sensitivity estimation using drug target profiles. We provide a brief overview of these three approaches below.
Random forest
Random Forest regression refers to an ensemble of regression trees [9] where a set of T un-pruned regression trees are generated based on bootstrap sampling from the original training data. For selecting the feature for splitting at each node, a random set of m features from the total M features are used. The inclusion of the concepts of bagging (Bootstrap sampling for each tree) and random subspace sampling (node split selected from random subset of features) increase the independence of the generated trees. Thus the averaging of the prediction over multiple trees has lower variance compared to individual regression trees.
Process of splitting a node
The partition γ∗ that maximizes C(γ,η_{ P }) for all possible partitions is selected for node η_{ P }. Note that for a continuous feature with n samples, a total of n partitions needs to be checked. Thus, the computational complexity of each node split is O(mn). During the tree generation process, a node with less than n_{ size } training samples is not partitioned any further.
Forest prediction
Using the randomized feature selection process, we fit the tree based on the bootstrap sample {(X_{1},Y_{1}),...,(X_{ n },Y_{ n })} generated from the training data.
Neural networks
Deep Learning (DL) is a revived Neural Networks (NN) based approach that is increasingly becoming popular due to its high predictive power for scenarios with large number of samples.
For this study the model parameters were selected after using grid search on a validation set. In general terms and for all the models, the early stopping deviance criteria was set to 0.001, with a ℓ_{2} regularization of 0.0001. The activation function chosen was a Tanh and 4 hidden layers with the same number of neurons for each layer. The number of neurons in each layer was set to be equal to the number of input features.
Sensitivity estimation using drug targets
The average of the k closest training vectors is our prediction. In our model we have chosen to look at k=5 closest samples.
We have developed two separate models for predicting drug sensitivity with KNN and target data. In the first method, which we denote as KNN Direct, we simply directly estimate AUC using all available target data for a single cell line. For the second method, called KNN Residual, we instead predict the residuals (actual values minus the mean) of each drug for a given cell line. We then add our residual prediction back onto the mean AUC of each drug for our final prediction.
Integrated prediction
Up to this point, we have considered each model independently, i.e. one model for one set of data. However biological processes are complex and restricting our data to a single type rarely shows us the whole picture. To overcome this limitation, we have also utilized a systems genomics approach in our predictions.
Linear stacking
Due to its high accuracy and low computational cost we have focused mainly on the Random Forest for our analysis of stacking. By comparison, the Neural Network has comparable accuracy but has significantly longer training train which did not make it practical for our purposes. It should be noted, however; that in principle linear stacking functions independent of the individual models and in most practical scenarios the model that has the highest accuracy for each given dataset should be chosen.
Analysis of stacking
In this section we illustrate some attractive benefits of stacking operation besides being a simple tool to combine outputs from different models. Our main focus is on demonstrating how stacking reduces bias in Random Forest (RF) prediction. We conceptualize the distribution of ensemble predictions arising from each tree in the RF and frame a Bayesian Hierarchical model. It is shown that, under the assumption of Gaussianity, the Bayes rule, under mean square loss, turns out to be a linear combination of individual model outputs.
Now, observe that each tree attempts to predict the target μ(x)=E(Y|x) and the RF predictor, \(\bar {Y}\) (obtained in 7), emerges as the sample average, \(\bar {Z}\) of Z(x). However, finite sample tree predictions are biased [6] resulting in E(Z_{ j }(x))=α_{ j }(n)+β_{ j }(n)μ(x) and \(Var(Z_{j}(\boldsymbol {x}))=\sigma ^{2}_{j} \;\; j=1,2,..,T\), where the additive bias α_{ j }(n) is a sequence of constants that goes to 0 as n→∞ and the multiplicative bias β_{ j }(n) is a sequence of constants that approaches 1 as n→∞ under some smoothness condition on true μ(x) [16]. Note that, in this construction \(\sigma ^{2}_{j}\) can be interpreted as the variance of individual tree estimates and, therefore, is of the order k_{ n }/n where k_{ n } is approximately the number of terminal nodes and n is the number of samples on which the tree is built [17].
For illustration purpose, we assume α_{ j }=0, β_{ j }=β>0 and \(\sigma ^{2}_{j}=\sigma ^{2},\;\; j=1,2,...,T\). In this formulation 0<β<1 is the event of underprediction by RF estimate, as is typical for small values of responses, and β>1 is the event of overprediction by RF estimate, as is typical for large values of responses [5]. For notational simplicity, we suppress the arguments n,Θ and x in relevant statistics henceforth. Under the assumption of Gaussianity and conditional independence, the joint distribution of [Z|μ,β,σ^{2}] is \(\prod _{j=1}^{T} { Normal}\left (\beta \mu,\sigma ^{2}\right)\). If there are no other models, we can assume a prior π(μ,σ^{2})∝1 (note that μ and β are not identifiable in this case) and the posterior mean of \(\mu |\sigma ^{2},\mathcal {D}_{F}\) turns out to be the familiar RF estimate. Suppose, we have another model, M, potentially operating on a different set of inputs, x_{ m }, but predicting the same response variable Y. We denote the training data for this model M as \(\mathcal {D}_{M}\). The output of this model is μ_{ m } which is an estimator of E(Y|x_{ m }). If we wish to pool both RF and model M together to generate predictions of Y, we can develop a hierarchical structure with μ_{ m } as a prior mean for μ, so that the posterior of μ is conditional on both \(\mathcal {D}_{F}\) and \(\mathcal {D}_{M}\). For simplicity, let us assume conjugacy and impose a Normal(μ_{ m },τ^{2}) prior on μ. If M is another ensemble model, τ^{2} can be computed in the same vein as σ^{2}. If M is deterministic, then a procedure to compute τ^{2} in a general setting is described in [18, 19].
Note that, the Bayes estimate under square error loss is the posterior mean λ which happens to have similar form as the foregoing linear stacking estimator.
How is this representation of stacking estimator insightful? Observe that if σ^{2} is small, in particular if σ^{2}≪τ^{2} and β>1 then \(\lambda \approx \frac {1}{\beta }\bar {Z}\). Thus, when the ensembles in RF overpredicts, the stacking estimator downweighs the RF estimator (with negligible contribution from μ_{ m }) thereby reducing the bias. On the other hand, if σ^{2}≪τ^{2} and 0<β<1 then \(\lambda \approx \frac {C\beta }{1+C\beta ^{2}}\bar {Z}+\frac {1}{1+C\beta ^{2}}\mu _{m}\), where C=Tτ^{2}/σ^{2}≫1. In this situation RF ensemble underpredicts but stacking operation counteracts in the following way: (a) When Cβ≤1, the stacking estimate underweighs RF estimate but adds a non-trivial fraction of μ_{ m }. In an extreme situation, when \(\beta (\in \mathbb {R}^{+})\) is in the neighborhood of 0, the stacking estimator does not put any weight on the RF estimate and solely uses μ_{ m } as the prediction, thereby reducing the RF bias. (b) When Cβ≫1 the stacking estimator upweighs the RF estimate with minimal contribution from μ_{ m }. Clearly, in all the three foregoing situation, stacking helps reducing the bias of RF estimates.
What happens when σ^{2} and τ^{2} are comparable or σ^{2}≫τ^{2}? Our argument from previous paragraph suggests that the debiasing characteristic of stacking operation will critically hinge on T. However, arbitrarily large T is not useful because after a certain number of trees, individual tree outputs will be correlated hence violating the fundamental premise of conditional independence in our setup. Consequently, the effect of stacking operation on debiasing RF output is ambiguous.
Observe that in practise, we do not need to estimate the relevant parameters in the coefficient of \(\bar {Z}\) and μ_{ m } in (11). We can simply replace μ by observed responses (that are not used to obtain \(\bar {Z}\) and μ_{ m }) and regress that on predictions obtained from RF and model M. The regression coefficients can be treated as the estimates of the coefficients in (11) while the intercept can be interpreted as an estimate of average additive bias. Thus, we argue that standard linear stacking operation should also behave according to the formulation above and will be an effective debiasing device subject to the variance condition.
The fact that we need σ^{2}≪τ^{2} to force the stacking estimator operate as a debiasing devise indicates that we ought to design the stacking operation in such a way that the above condition is satisfied. Consider a generic situation where \(\mathcal {D}_{M}\) consists of n_{1} independent samples and the feature matrix x_{ m } is of dimension n_{1}×p_{1}. \(\mathcal {D}_{F}\) consists on n_{2} samples and the corresponding feature matrix x is of dimension n_{2}×(p_{1}+p_{2}), with col(x_{ m })⊂col(x), i.e., \(\mathcal {D}_{F}\) includes all the features observed in \(\mathcal {D}_{M}\) and also p_{2} additional feautes. We must predict the response utilizing the entire set of p_{1}+p_{2} features. One can easily combine these two training sets by training an RF on \(\mathcal {D}_{M}\), obtain μ_{ m } and τ^{2} and then use this prior information on the RF trained on \(\mathcal {D}_{F}\). In other words, one can simply stack RFs trained on \(\mathcal {D}_{M}\) and \(\mathcal {D}_{F}\). We call this operation vertical stacking. Since both are RF estimators σ^{2} is of order k_{.}/n_{2} and τ^{2} is of order k_{.}/n_{1}. Since k_{.} is typically user specifed and can be made to remain constant in both RFs, the variances of the stacking components are essentially determined by the sample sizes of the respective training set. Clearly, if n_{2}<n_{1} the above condition relating the variances of the stacking components cannot be enforced. One can argue that the variance condition can be maintained by switching the generic label σ^{2} and τ^{2}, but in this situation \(\mathcal {D}_{F}\) contains more information as compared to \(\mathcal {D}_{M}\) and hence we would like to put more weight on the RF trained on \(\mathcal {D}_{F}\). In other words, the stacking operation should more effectively debias the RF estimates obtained from \(\mathcal {D}_{F}\) than the other way around.
Results
In this section, we first demonstrate the performance of vertical stacking and horizontal stacking of two RF components on a synthetic dataset and a real dataset. For both datasets we observe that horizontal stacking is not only more effective in reducing bias but also consistently outperforms its vertical counterpart in terms of MSE. Next, we demonstrate how linear stacking of different models operating on different non-overlaping features outperforms each individual component both in terms of bias and MSE reduction.
Synthetic data
We generate a 2000×100 matrix of covariates drawn from a Normal(0,1) distribution. Then a random set of 100 weights are created, half are randomly selected to be "weak" predictors and are drawn from a Uniform(0,0.5) distribution while the other half are "strong" predictors drawn from Uniform(1.5,3). These weights are linearly combined with our covariates to create a set of 2000 responses. We then obtain the variance of the foregoing responses and add gaussian noise with a variance set at 3% the sample variance of the non-noisy data and add an intercept of 1.4. Finally, the noisy responses are normalized into the range of [−1,1].
Data is sectioned off into three groups. The first group is a set of 100 samples that constitutes our initial training set. The second group contains 50 samples and serves as a validation set for building the stacking ensemble predictors. The third group is 500 used for testing. The remaining samples are reserved for later addition into group one. The entire process is then repeated to generate 100 independent sets of data which are treated as replicates.
To illustrate the operation of vertical and horizontal stacking we divide our synthetic training data into the groups illustrated in Fig. 2 and build our individual and stacked models. Each individual model is a Random Forest with 50 trees and each tree utilizes one-third the input features. When splitting our features we make certain each horizontal group has at least 25 weak and 25 strong features. We then start adding samples, 20 at a time, to each group, remake our models and then re-estimate our MSE.
Analysis of CCLE data
Analysis of GDSC data
We have noticed this to be particularly effective when values are misreported as nanomolars instead of the standard micromolar. For the final value, we pick the median of all remaining values. These target values are then binarized using a threshold of one-half of the maximum dosage of the respective compound (taken from GDSC as well). A target is considered inhibited (value of 1) if the target value is less than one-half the max dose otherwise the value is set to 0.
Explanation of all individual techniques used to predict drug AUC. Methods utilizing Residuals predict the sample-mean centered sensitivities (actual AUC- mean AUC) instead of the AUC directly
Method | Description |
---|---|
Mean | Prediction using the mean AUC of each drug |
KNN Direct | K Nearest Neighbor (KNN) Approach using the actual AUC with drug target data |
KNN Residual | KNN using the residuals with drug target data |
NN GE | Neural Network on Gene Expressions |
NN Phy Direct | Neural Network on Chemical Descriptors of drugs |
RF Phy Residual | Random Forest on Chemical Descriptors of drugs using the residuals |
RF GE | Random Forest on Gene expression |
Performance of Single Predictors in terms of correlation coefficient between predicted and actual AUCs (correlation) and normalized mean square error (NMSE) for predicting AUC. Models used for building higher order linear ensembles are shown in bold
Method | Correlation | NMSE |
---|---|---|
Mean | 0.6345 | 1 |
KNN Residual | 0.6786 | 0.931 |
KNN Direct | 0.3623 | 1.555 |
NN GE | 0.7033 | 0.8613 |
NN Phy Direct | 0.3485 | 1.947 |
RF Phy Residual | 0.6819 | 0.8960 |
RF GE | 0.7276 | 0.7910 |
Correlation coefficients and NMSEs, in parenthesis, of second order linear ensemble with two component models for AUC prediction. Top 3 predictors are shown in bold
KNN residual | NN GE | RF phy residual | RF GE | |
---|---|---|---|---|
Mean | 0.7181(0.8092) | 0.7120(0.8266) | 0.7090(0.8324) | 0.7264(0.7919) |
KNN Residual | 0.7492(0.7341) | 0.7197(0.8092) | 0.7550(0.7225) | |
NN GE | 0.7455(0.7457) | 0.7258(0.7919) | ||
RF Phys Residual | 0.7504(0.7341) |
Correlation coefficient between predicted and actual AUCs (correlation), and normalized mean square error (NMSE) for predicting AUC of our best single predictor, the RF GE, and linear stacking of five individual predictive models that appear in bold in Table 2
Correlation | NMSE | |
---|---|---|
RF GE | 0.7276 | 0.791 |
RF GE + KNN Residual | 0.7550 | 0.7225 |
Linear Stacking Ensemble | 0.7746 | 0.6705 |
Discussion
Comparison of Bias Correction techniques for improving bias angle and overall error. From top to bottom we have our best individual predictor RF GE (bolded). RF GE ensembled with our KNN utilizing the drug targets. RF GE ensembled with NN GE alone. RF GE ensembled with RF Phys Residual. Linear ensemble of all methods that are bolded in Table 2. For each method we shows correlation coefficient between predicted and actual AUCs (correlation), normalized mean squared error(NMSE), mean θ across all drugs (θ_{ μ }), and 95% bootstrap confidence interval lower and upper bounds, (θ_{ L } and θ_{ H } respectively) BC1 and RRot denote our RF GE corrected using techniques found in [5]
Correlation | NMSE | θ _{ μ } | θ _{ L } | θ _{ H } | |
---|---|---|---|---|---|
RF GE | 0.7276 | 0.7910 | 38.27° | 37.34° | 39.04° |
RF GE + KNN Residual | 0.7550 | 0.7225 | 35.23° | 34.07° | 36.30° |
RF GE + NNGE | 0.7258 | 0.7919 | 38.22° | 37.39° | 39.03° |
RF GE + RF Phys Residual | 0.7504 | 0.7341 | 35.26° | 34.23° | 36.12° |
Linear StackingEnsemble | 0.7746 | 0.6705 | 34.25° | 33.15° | 35.26° |
BC1 | 0.7184 | 0.8092 | 40.61° | 40.00° | 41.11° |
RRot | 0.7084 | 0.8382 | 40.60° | 40.02° | 41.12° |
Conclusions
Drug interactions of cancer cell lines are complex biological processes that can not be fully characterized using only genomic and drug properties. Accurate drug sensitivity predictions for personalized medicine will require the use of a variety of feature sets from multiple data sources. In this article we have shown that by incorporating drug target data from Pubchem and the physical properties generated using PaDEL we are able to improve the prediction accuracy of a Random Forest model trained on gene expression data. In particular, we have shown that such ensemble learners are effective in automatically removing the bias inherent in the Random Forest models. We have also derived a necessary condition for the linear ensemble to be an effective debiasing devising and described a degined approach to stacking operation. In the future other sources of data can be included to improve prediction accuracy. For example recent models built on protein-protein interaction networks [23] could provide information that is not captured by our current stacked model. However, we note that the entire theoretically premise is built upon the assumption of linear bias. We propose to investigate a more general stacking approach to handle non-linear biases.
Notes
Acknowledgements
Not applicable.
Funding
This work was supported by NIH grant R01GM122084-01. The publication costs of this article was funded by NIH grant R01GM122084.
Availability of data and materials
For the analysis of stacking, synthetic data can be downloaded using the following link, https://tinyurl.com/y82gb7x3 while gene expression and area under the curve values are from the Cancer Cell Line Encyclopedia https://portals.broadinstitute.org/ccle.
Drug Target data and structure files are taken from PubChem https://pubchem.ncbi.nlm.nih.gov/. Chemical descriptors are generated using PaDEL-Descriptor software http://www.yapcwsoft.com/dd/padeldescriptor/ using the structure files. Gene Expression and Area Under the Curve values for cell lines is available in the Genomics of Drug Sensitivity in Cancer repository, http://www.cancerrxgene.org/.
About this supplement
This article has been published as part of BMC Bioinformatics Volume 19 Supplement 3, 2018: Selected original research articles from the Fourth International Workshop on Computational Network Biology: Modeling, Analysis, and Control (CNB-MAC 2017): bioinformatics. The full contents of the supplement are available online at https://bmcbioinformatics.biomedcentral.com/articles/supplements/volume-19-supplement-3.
Authors’ contributions
Performed Predictions for Individual Models: KM RR CD Conceived and Designed the stacking algorithm: KM SG RP. Analyzed the Results: KM SG RP. Wrote the article: KM RR CD SG RP. All authors read and approved the final manuscript.
Ethics approval and consent to participate
Not applicable.
Consent for publication
Not applicable.
Competing interests
The authors declare that they have no competing interests.
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
References
- 1.Wan Q, Pal R. An ensemble based top performing approach for nci-dream drug sensitivity prediction challenge. PLOS ONE. 2014; 9(6):101183.CrossRefGoogle Scholar
- 2.Haider S, Rahman R, Ghosh S, Pal R. A copula based approach for design of multivariate random forests for drug sensitivity prediction. PloS ONE. 2015; 10(12):0144490.CrossRefGoogle Scholar
- 3.Costello JC, et al.A community effort to assess and improve drug sensitivity prediction algorithms. Nat Biotechnol. 2014:1202–12. https://doi.org/10.1038/nbt.2877.
- 4.Riddick G, Song H, Ahn S, Walling J, Borges-Rivera D, Zhang W, Fine HA. Predicting in vitro drug sensitivity using Random Forests. Bioinformatics. 2011; 27(2):220–4.CrossRefPubMedGoogle Scholar
- 5.Song J. Bias ccorrection for random forest in regression using residual rotation. J Korean Stat Soc. 2015; 44:321–6.CrossRefGoogle Scholar
- 6.Zhang G, Lu Y. Bias-corrected random forests in regression. J Appl Stat. 2012; 39(1):151–60.CrossRefGoogle Scholar
- 7.Barretina J, et al. The Cancer Cell Line Encyclopedia enables predictive modelling of anticancer drug sensitivity. Nature. 2012; 483(7391):603–7.CrossRefPubMedPubMedCentralGoogle Scholar
- 8.Yang WEA. Genomics of drug sensitivity in cancer (gdsc): a resource for therapeutic biomarker discovery in cancer cells. Nucleic Acids Res. 2013; 41(D1):955–61.CrossRefGoogle Scholar
- 9.Breiman L. Random forests. Mach Learn. 2001; 45(1):5–32.CrossRefGoogle Scholar
- 10.Meinshausen N. Quantile regression forests. J Mach Learn Res. 2006; 7(Jun):983–99.Google Scholar
- 11.Rahman R, Haider S, Ghosh S, Pal R. Design of probabilistic random forests with applications to anticancer drug sensitivity prediction. Cancer Inform. 2015; 14(Suppl 5):57.PubMedGoogle Scholar
- 12.The H2O.ai team. H2o: R Interface for H2O. 2017. R package version 3.10.3.4. https://github.com/h2oai/h2o-3. Accessed 15 Feb 2017.
- 13.Phan W, et al.Deep Learning with Deep Water. 2017. http://h2o.ai/resources. Accessed 15 Feb 2017.
- 14.Cook D. Practical Machine Learning with H2O: Powerful, Scalable Techniques for Deep Learning and AI. Sebastopol: O’Reilly Media; 2016.Google Scholar
- 15.Grasso CS, Tang Y, Truffaux N, Berlow NE, Liu L, Debily M, Quist MJ, Davis LE, Huang EC, Woo PJ, Ponnuswami A, Chen S, Johung T, Sun W, Kogiso M, Du Y, Lin Q, Huang Y, Hutt-Cabezas M, Warren KE, Dret LL, Meltzer PS, Mao H, Quezado M, van Vuurden DG, Abraham J, Fouladi M, Svalina MN, Wang N, Hawkins C, Nazarian J, Alonso MM, Raabe E, Hulleman E, Spellman PT, Li X, Keller C, Pal R, Grill J, Monje M. Functionally-defined therapeutic targets in diffuse intrinsic pontine glioma. Nat Med. 2015;(nm.3855). https://doi.org/10.1038/nm.3855. http://www.nature.com/nm/journal/vaop/ncurrent/full/nm.3855.html.
- 16.Biau G. Analysis of a random forests model. J Mach Learn Res. 2012; 13(1):1063–95.Google Scholar
- 17.Devroye L, Gyorfi L, Lugosi G. A Probabilistic Theory of Pattern Recognition. Berlin: Springer; 1996.CrossRefGoogle Scholar
- 18.Ghosh S, Gelfand AE, Mølhave T. Attaching uncertainty to deterministic spatial interpolations. Stat Methodol. 2012; 9(1-2):251–64. https://doi.org/10.1016/j.stamet.2011.06.001. Special Issue on Astrostatistics + Special Issue on Spatial Statistics.CrossRefGoogle Scholar
- 19.Paci L, Gelfand AE, Cocchi D. Quantifying uncertainty for temperature maps derived from computer models. Spatial Stat. 2015; 12:96–108. https://doi.org/10.1016/j.spasta.2015.03.005.CrossRefGoogle Scholar
- 20.Kononenko I. Estimating attributes: Analysis and extensions of relief. In: Proceedings of the European Conference on Machine Learning on Machine Learning, ECML-94. Secaucus: Springer: 1994. p. 171–82. http://dl.acm.org/citation.cfm?id=188408.188427.Google Scholar
- 21.Yap CW. Padel-descriptor: an open source software to calculate molecular descriptors and fingerprints. J Comput Chem. 2010; 32(7):1466–74.CrossRefPubMedGoogle Scholar
- 22.Wang Y, Suzek T, Zhang J, Wang J, He S, Cheng T, Shoemaker BA, Gindulyte A, Bryant SH. Pubchem bioassay: 2014 update. Nucleic Acids Res. 2013; 42(Database issue):1075–82.Google Scholar
- 23.Stanfield Z, Coskun M, Koyutürk M. Drug response prediction as a link prediction problem. Sci Rep. 2017; 7. https://doi.org/10.1038/srep40321.
Copyright information
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.