Skip to main content
Log in

Assessment of uncertainty in chemical models by Bayesian probabilities: Why, when, how?

  • Special Series: Statistics in Molecular Modeling
  • Guest Editor: Anthony Nicholls
  • Published:
Journal of Computer-Aided Molecular Design Aims and scope Submit manuscript

Abstract

A prediction of a chemical property or activity is subject to uncertainty. Which type of uncertainties to consider, whether to account for them in a differentiated manner and with which methods, depends on the practical context. In chemical modelling, general guidance of the assessment of uncertainty is hindered by the high variety in underlying modelling algorithms, high-dimensionality problems, the acknowledgement of both qualitative and quantitative dimensions of uncertainty, and the fact that statistics offers alternative principles for uncertainty quantification. Here, a view of the assessment of uncertainty in predictions is presented with the aim to overcome these issues. The assessment sets out to quantify uncertainty representing error in predictions and is based on probability modelling of errors where uncertainty is measured by Bayesian probabilities. Even though well motivated, the choice to use Bayesian probabilities is a challenge to statistics and chemical modelling. Fully Bayesian modelling, Bayesian meta-modelling and bootstrapping are discussed as possible approaches. Deciding how to assess uncertainty is an active choice, and should not be constrained by traditions or lack of validated and reliable ways of doing it.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

Notes

  1. We provide both ways to express the model to demonstrate the transition from classical statistical model specification, where the probabilistic model is implemented to the errors, to the general model specification, where the whole model is probabilistic.

  2. The Bayesian framework is usually presented with parametric models, but is possible to apply on non-parametric models as well.

  3. It does not have to be the classifier. Later we give an example where C is a variable expressing reliability in prediction given by the number of times a compound is classified as active from a set of ensemble predictions.

References

  1. Nicholls A (2014) Confidence limits, error bars and method comparison in molecular modeling. Part 1: the calculation of confidence intervals. JCAMD 28(9):887–918

    CAS  Google Scholar 

  2. Sahlin U, Golsteijn L, Iqbal MS, Peijnenburg W (2013) Arguments for considering uncertainty in QSAR predictions in hazard and risk assessments. ATLA 41(1):91–110

    CAS  Google Scholar 

  3. Iqbal MS, Golsteijn L, Oberg T, Sahlin U, Papa E, Kovarich S, Huijbregts MAJ (2013) Understanding quantitative structure–property relationships uncertaity in environmental fate modelling. Environ Toxicol Chem 32(5):1069–1076

    Article  CAS  Google Scholar 

  4. Jaworska J, Gabbert S, Aldenberg T (2010) Towards optimization of chemical testing under REACH: a Bayesian network approach to integrated testing strategies. Regul Toxicol Pharmacol 57(2–3):157–167

    Article  CAS  Google Scholar 

  5. Eriksson L, Jaworska J, Worth AP, Cronin MTD, McDowell RM, Gramatica P (2003) Methods for reliability and uncertainty assessment and for applicability evaluations of classification- and regression-based QSARs. Environ Health Perspect 111(10):1361–1375

    Article  CAS  Google Scholar 

  6. Geisser S (1993) Predictive inference: an introduction. Chapman & Hall, New York

    Book  Google Scholar 

  7. Wood DJ, Carlsson L, Eklund M, Norinder U, Stalring J (2013) QSAR with experimental and predictive distributions: an information theoretic approach for assessing model quality. JCAMD 27(3):203–219

    CAS  Google Scholar 

  8. Gelman A, Hill J (2007) Data analysis using regression and multilevel/hierarchical models. Cambridge University Press, Cambridge

    Google Scholar 

  9. Hastie T, Tibshirani R, Friedman JH (2009) The elements of statistical learning : data mining, inference, and prediction, 2nd edn. Springer, New York

    Book  Google Scholar 

  10. Bosnic Z, Kononenko I (2009) An overview of advances in reliability estimation of individual predictions in machine learning. Intell Data Anal 13(2):385–401

    Google Scholar 

  11. Cox DR (2006) Principles of statistical inference. Cambridge University Press, Cambridge

    Book  Google Scholar 

  12. Aldenberg T, Jaworska JS (2000) Uncertainty of the hazardous concentration and fraction affected for normal species sensitivity distributions. Ecotoxicol Environ Saf 46(1):1–18

    Article  CAS  Google Scholar 

  13. Aven T, Kvaløy JT (2002) Implementing the Bayesian paradigm in risk analysis. Reliab Eng Syst Saf 78(2):195–201

    Article  Google Scholar 

  14. Sahlin U (2013) Uncertainty in QSAR predictions. ATLA 41:111–125

    CAS  Google Scholar 

  15. Fielding AH, Bell JF (1997) A review of methods for the assessment of prediction errors in conservation presence/absence models. Environ Conserv 24(1):38–49

    Article  Google Scholar 

  16. O’Hara RB, Sillanpaa MJ (2009) A review of Bayesian variable selection methods: What, how and which. Bayesian Anal 4(1):85–117

    Article  Google Scholar 

  17. Hoeting JA, Madigan D, Raftery AE, Volinsky CT (1999) Bayesian model averaging: a tutorial. Stat Sci 14(4):382–401

    Article  Google Scholar 

  18. Andrieu C, Doucet A, Holenstein R (2010) Particle Markov chain Monte Carlo methods. J R Stat Soc Series B Stat Methodol 72:269–342

    Article  Google Scholar 

  19. Petralias A, Dellaportas P (2013) An MCMC model search algorithm for regression problems. J Stat Comput Simul 83(9):1722–1740

    Article  Google Scholar 

  20. Park T, Casella G (2008) The Bayesian Lasso. J Am Stat Assoc 103(482):681–686

    Article  CAS  Google Scholar 

  21. Tipping ME (2004) Bayesian inference: an introduction to principles and practice in machine learning. In: Bousquet O, VonLuxburg U, Ratsch G (eds) Advanced Lectures on Machine Learning, vol 3176. Springer-verlag, Hiedelberg, pp 41–62

  22. Rue H, Martino S, Chopin N (2009) Approximate Bayesian inference for latent Gaussian models by using integrated nested Laplace approximations. J R Stat Soc B Series Methodol 71:319–392

    Article  Google Scholar 

  23. Rasmussen CE (2004) Gaussian processes in machine learning. In: Bousquet O, VonLuxburg U, Ratsch G (eds) Lecture notes in artificial intelligence, vol 3176. Springer-verlag, Hiedelberg, pp 63–71

  24. Schwaighofer A, Schroeter T, Mika S, Blanchard G (2009) How wrong can we get? A review of machine learning approaches and error bars. Comb Chem High Throughput Screen 12(5):453–468

    Article  CAS  Google Scholar 

  25. Denham MC (1997) Prediction intervals in partial least squares. J Chemom 11(1):39–52

    Article  CAS  Google Scholar 

  26. O’Hagan A (2006) Bayesian analysis of computer code outputs: a tutorial. Reliab Eng Syst Saf 91(10–11):1290–1300

    Article  Google Scholar 

  27. Clark RD, Liang W, Lee AC, Lawless MS, Fraczkiewicz R, Waldman M (2014) Using beta binomials to estimate classification uncertainty for ensemble models. J Chemom 6:34

    Google Scholar 

  28. Tetko IV, Sushko I, Pandey AK, Zhu H, Tropsha A, Papa E, Oberg T, Todeschini R, Fourches D, Varnek A (2008) Critical assessment of QSAR models of environmental toxicity against Tetrahymena pyriformis: focusing on applicability domain and overfitting by variable selection. J Chem Inf Model 48(9):1733–1746

    Article  CAS  Google Scholar 

  29. Sahlin U, Jeliazkova N, Öberg T (2013) Applicability domain dependent predictive uncertainty in QSAR regressions. Mol Inform 33(1):26–35

    Article  Google Scholar 

  30. Davison AC, Hinkley DV (1997) Bootstrap methods and their application. Cambridge Univ. Press, Cambridge

    Book  Google Scholar 

  31. Rubin DB (1981) The Bayesian Bootstrap. Ann Stat 9(1):130–134

    Article  Google Scholar 

Download references

Acknowledgments

This work has been funded by the Swedish Research Council Formas through the project 219-2013-1271 “Scaling up uncertain environmental evidence-Quality assurance in ecosystem service predictions” and through the strategic research area Biodiversity and Ecosystems in a Changing Climate, BECC and by the European Seventh Framework Programme through the CADASTER (CAse studies on the Development and Application of in-Silico Techniques for Environmental hazard and Risk assessment) project FP7-ENV-2007-1-212668. The author wish to thank Rasmus Bååth and Tom Aldenberg for nice discussions on Bayesian concepts and Niklas Vareman and Yann Clough for valuable comments.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ullrika Sahlin.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Sahlin, U. Assessment of uncertainty in chemical models by Bayesian probabilities: Why, when, how?. J Comput Aided Mol Des 29, 583–594 (2015). https://doi.org/10.1007/s10822-014-9822-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10822-014-9822-3

Keywords

Navigation