Best Practices for Constructing Reproducible QSAR Models

  • Chanin NantasenamatEmail author
Part of the Methods in Pharmacology and Toxicology book series (MIPT)


Quantitative structure-activity/property relationship (QSAR/QSPR) has been instrumental in unraveling the origins of the mechanism of action for biological activity of interest by means of mathematical formulation as a function of the physicochemical description of chemical structures. Of the growing number of QSAR models being published in the literature, it is estimated that the majority of these models are not reproducible given the heterogeneity of the components of the QSAR model setup (e.g., descriptor, learning algorithm, learning parameters, open-source and commercial software, different software versions, etc.) and the limited availability of the underlying raw data and analysis source codes used to construct these models. This inherently poses a challenge for newcomers and practitioners in the field to reproduce or make use of the published QSAR models. However, this is expected to change in light of the growing momentum for open data and data sharing that are being encouraged by funders, publishers, and journals as well as driven by the nextageneration of researchers who embrace open science for pushing science forward. This chapter examines these issues and provides general guidelines and best practices for constructing reproducible QSAR models.

Key words

Quantitative structure-activity relationship Quantitative structure-property relationship Structure-activity relationship QSAR QSPR SAR Research reproducibility Reproducibility Reproducible Jupyter Python 



This work is supported by the Research Career Development Grant (No. RSA6280075) from the Thailand Research Fund.


  1. 1.
    Nantasenamat C, Isarankura-Na-Ayudhya C, Naenna T, Prachayasittikul V (2009) A practical overview of quantitative structure-activity relationship. EXCLI J 8(7):74–88Google Scholar
  2. 2.
    Nantasenamat C, Isarankura-Na-Ayudhya C, Prachayasittikul V (2010) Advances in computational methods to predict the biological activity of compounds. Exp Opin Drug Discov 5(7):633–654CrossRefGoogle Scholar
  3. 3.
    Piir G, Kahn I, Garcia-Sosa AT, Sild S, Ahte P, Maran U (2018) Best practices for QSAR model reporting: physical and chemical properties, ecotoxicity, environmental fate, human health, and toxicokinetics endpoints. Environ Health Perspect 126(12):126001CrossRefGoogle Scholar
  4. 4.
    Hansch C, Maloney PP, Fujita T, Muir RM (1962) Correlation of biological activity of phenoxyacetic acids with Hammett substituent constants and partition coefficients. Nature 194(4824):178–180CrossRefGoogle Scholar
  5. 5.
    Fujita T, Winkler DA (2016) Understanding the roles of the “Two QSARs”. J Chem Inf Model 56(2):269–274CrossRefGoogle Scholar
  6. 6.
    Cherkasov A, Muratov EN, Fourches D, Varnek A, Baskin II, Cronin M et al (2014) QSAR modeling: where have you been? Where are you going to? J Med Chem 57(12):4977–5010CrossRefGoogle Scholar
  7. 7.
    Sprous DG, Palmer RK, Swanson JT, Lawless M (2010) QSAR in the pharmaceutical research setting: QSAR models for broad, large problems. Curr Top Med Chem 10(6):619–637CrossRefGoogle Scholar
  8. 8.
    Fjodorova N, Novich M, Vrachko M, Smirnov V, Kharchevnikova N, Zholdakova Z et al (2008) Directions in QSAR modeling for regulatory uses in OECD member countries, EU and in Russia. J Environ Sci Health C Environ Carcinog Ecotoxicol Rev 26(2):201–236CrossRefGoogle Scholar
  9. 9.
    Garabedian TE (1997) Laboratory record keeping. Nat Biotechnol 15(8):799–800CrossRefGoogle Scholar
  10. 10.
    Rubacha M, Rattan AK, Hosselet SC (2011) A review of electronic laboratory notebooks available in the market today. J Lab Autom 16(1):90–98CrossRefGoogle Scholar
  11. 11.
    Mascarelli A (2014) Research tools: jump off the page. Nature 507(7493):523–525CrossRefGoogle Scholar
  12. 12.
    Macmillan Publishers Limited (2016) Announcement: where are the data? Nature 537(7619):138CrossRefGoogle Scholar
  13. 13.
    Celi LA, Citi L, Ghassemi M, Pollard TJ (2019) The PLOS ONE collection on machine learning in health and biomedicine: towards open code and open data. PLoS ONE 14(1):e0210232CrossRefGoogle Scholar
  14. 14.
    Vasilevsky NA, Minnier J, Haendel MA, Champieux RE (2017) Reproducible and reusable research: are journal data sharing policies meeting the mark? PeerJ 5:e3208CrossRefGoogle Scholar
  15. 15.
    Greenwald NF, Bandopadhayay P, Beroukhim R (2017) Open data: spot data glitches before publication. Nature 550(7676):333CrossRefGoogle Scholar
  16. 16.
    Gedeck P, Skolnik S, Rodde S (2017) Developing collaborative QSAR models without sharing structures. J Chem Inf Model 57(8):1847–1858CrossRefGoogle Scholar
  17. 17.
    Polanski J, Bak A, Gieleciak R, Magdziarz T (2006) Modeling robust QSAR. J Chem Inf Model 46(6):2310–2318CrossRefGoogle Scholar
  18. 18.
    Shoombuatong W, Prathipati P, Owasirikul W, Worachartcheewan A, Simeon S, Anuwongcharoen N et al (2017) Towards the revival of interpretable QSAR models. In: Roy K (ed) Advances in QSAR modeling: applications in pharmaceutical, chemical, food, agricultural and environmental sciences. Springer International Publishing, Cham, pp 3–55. Available from: Scholar
  19. 19.
    Guha R, Willighagen E (2012) A survey of quantitative descriptions of molecular structure. Curr Top Med Chem 12(18):1946–1956CrossRefGoogle Scholar
  20. 20.
    Grisoni F, Consonni V, Todeschini R (2018) Impact of molecular descriptors on computational models. In: Brown JB (ed) Computational chemogenomics. Humana Press, New York, pp 171–209CrossRefGoogle Scholar
  21. 21.
    Guha R, Van Drie JH (2008) Structure–activity landscape index: identifying and quantifying activity cliffs. J Chem Inf Model 48(3):646–658CrossRefGoogle Scholar
  22. 22.
    Sisay MT, Peltason L, Bajorath J (2009) Structural interpretation of activity cliffs revealed by systematic analysis of structure-activity relationships in analog series. J Chem Inf Model 49(10):2179–2189CrossRefGoogle Scholar
  23. 23.
    Guimarães MC, Duarte MH, Silla JM, Freitas MP (2016) Is conformation a fundamental descriptor in QSAR? A case for halogenated anesthetics. Beilstein J Org Chem 12:760–768CrossRefGoogle Scholar
  24. 24.
    Pissurlenkar RR, Khedkar VM, Iyer RP, Coutinho EC (2011) Ensemble QSAR: a QSAR method based on conformational ensembles and metric descriptors. J Comput Chem 32(10):2204–2218CrossRefGoogle Scholar
  25. 25.
    Wicker JG, Cooper RI (2016) Beyond rotatable bond counts: capturing 3D conformational flexibility in a single descriptor. J Chem Inf Model 56(12):2347–2352CrossRefGoogle Scholar
  26. 26.
    Dearden J, Cronin M, Kaiser K (2009) How not to develop a quantitative structure-activity or structure-property relationship (QSAR/QSPR). SAR QSAR Environ Res 20(3–4):241–266CrossRefGoogle Scholar
  27. 27.
    Tropsha A (2010) Best practices for QSAR model development, validation, and exploitation. Mol Inform 29(6–7):476–488CrossRefGoogle Scholar
  28. 28.
    Tropsha A, Gramatica P, Gombar VK (2003) The importance of being earnest: validation is the absolute essential for successful application and interpretation of QSPR models. QSAR Comb Sci 22(1):69–77CrossRefGoogle Scholar
  29. 29.
    Golbraikh A, Muratov E, Fourches D, Tropsha A (2014) Data set modelability by QSAR. J Chem Inf Model 54(1):1–4CrossRefGoogle Scholar
  30. 30.
    Roy PP, Kovarich S, Gramatica P (2011) QSAR model reproducibility and applicability: a case study of rate constants of hydroxyl radical reaction models applied to polybrominated diphenyl ethers and (benzo-)triazoles. J Comput Chem 32(11):2386–2396CrossRefGoogle Scholar
  31. 31.
    Svensson F, Aniceto N, Norinder U, Cortes-Ciriano I, Spjuth O, Carlsson L et al (2018) Conformal regression for quantitative structure-activity relationship modeling-quantifying prediction uncertainty. J Chem Inf Model 58(5):1132–1140CrossRefGoogle Scholar
  32. 32.
    Bosc N, Atkinson F, Felix E, Gaulton A, Hersey A, Leach AR (2019) Large scale comparison of QSAR and conformal prediction methods and their applications in drug discovery. J Cheminform 11(1):4CrossRefGoogle Scholar
  33. 33.
    Lampa S, Alvarsson J, Arvidsson Mc Shane S, Berg A, Ahlberg E, Spjuth O (2018) Predicting off-target binding profiles with confidence using conformal prediction. Front Pharmacol 9:1256CrossRefGoogle Scholar
  34. 34.
    Spjuth O, Willighagen EL, Guha R, Eklund M, Wikberg JE (2010) Towards interoperable and reproducible QSAR analyses: exchange of datasets. J Cheminform 2:5CrossRefGoogle Scholar
  35. 35.
    Spjuth O, Helmus T, Willighagen EL, Kuhn S, Eklund M, Wagener J et al (2007) Bioclipse: an open source workbench for chemo- and bioinformatics. BMC Bioinfo 8:59CrossRefGoogle Scholar
  36. 36.
    Ruusmann V, Sild S, Maran U (2014) QSAR DataBank – an approach for the digital organization and archiving of QSAR model information. J Cheminform 6:25CrossRefGoogle Scholar
  37. 37.
    Ruusmann V, Sild S, Maran U (2015) QSAR DataBank repository: open and linked qualitative and quantitative structure-activity relationship models. J Cheminform 7:32CrossRefGoogle Scholar
  38. 38.
    Ruusmann V, Sild S, Maran U (2012) r-qsardb R package.
  39. 39.
    Fourches D, Muratov E, Tropsha A (2010) Trust, but verify: on the importance of chemical structure curation in cheminformatics and QSAR modeling research. J Chem Inf Model 50(7):1189–1204CrossRefGoogle Scholar
  40. 40.
    Fourches D, Muratov E, Tropsha A (2016) Trust, but verify II: a practical guide to chemogenomics data curation. J Chem Inf Model 56(7):1243–1252CrossRefGoogle Scholar
  41. 41.
    Fourches D, Muratov E, Tropsha A (2015) Curation of chemogenomics data. Nat Chem Biol 11(8):535CrossRefGoogle Scholar
  42. 42.

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2020

Authors and Affiliations

  1. 1.Center of Data Mining and Biomedical Informatics, Faculty of Medical TechnologyMahidol UniversityBangkokThailand

Personalised recommendations