Skip to main content

Best Practices for Constructing Reproducible QSAR Models

  • Protocol
  • First Online:
Ecotoxicological QSARs

Part of the book series: Methods in Pharmacology and Toxicology ((MIPT))

Abstract

Quantitative structure-activity/property relationship (QSAR/QSPR) has been instrumental in unraveling the origins of the mechanism of action for biological activity of interest by means of mathematical formulation as a function of the physicochemical description of chemical structures. Of the growing number of QSAR models being published in the literature, it is estimated that the majority of these models are not reproducible given the heterogeneity of the components of the QSAR model setup (e.g., descriptor, learning algorithm, learning parameters, open-source and commercial software, different software versions, etc.) and the limited availability of the underlying raw data and analysis source codes used to construct these models. This inherently poses a challenge for newcomers and practitioners in the field to reproduce or make use of the published QSAR models. However, this is expected to change in light of the growing momentum for open data and data sharing that are being encouraged by funders, publishers, and journals as well as driven by the nextageneration of researchers who embrace open science for pushing science forward. This chapter examines these issues and provides general guidelines and best practices for constructing reproducible QSAR models.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Protocol
USD 49.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 149.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 199.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 219.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Nantasenamat C, Isarankura-Na-Ayudhya C, Naenna T, Prachayasittikul V (2009) A practical overview of quantitative structure-activity relationship. EXCLI J 8(7):74–88

    Google Scholar 

  2. Nantasenamat C, Isarankura-Na-Ayudhya C, Prachayasittikul V (2010) Advances in computational methods to predict the biological activity of compounds. Exp Opin Drug Discov 5(7):633–654

    Article  CAS  Google Scholar 

  3. Piir G, Kahn I, Garcia-Sosa AT, Sild S, Ahte P, Maran U (2018) Best practices for QSAR model reporting: physical and chemical properties, ecotoxicity, environmental fate, human health, and toxicokinetics endpoints. Environ Health Perspect 126(12):126001

    Article  CAS  Google Scholar 

  4. Hansch C, Maloney PP, Fujita T, Muir RM (1962) Correlation of biological activity of phenoxyacetic acids with Hammett substituent constants and partition coefficients. Nature 194(4824):178–180

    Article  CAS  Google Scholar 

  5. Fujita T, Winkler DA (2016) Understanding the roles of the “Two QSARs”. J Chem Inf Model 56(2):269–274

    Article  CAS  Google Scholar 

  6. Cherkasov A, Muratov EN, Fourches D, Varnek A, Baskin II, Cronin M et al (2014) QSAR modeling: where have you been? Where are you going to? J Med Chem 57(12):4977–5010

    Article  CAS  Google Scholar 

  7. Sprous DG, Palmer RK, Swanson JT, Lawless M (2010) QSAR in the pharmaceutical research setting: QSAR models for broad, large problems. Curr Top Med Chem 10(6):619–637

    Article  CAS  Google Scholar 

  8. Fjodorova N, Novich M, Vrachko M, Smirnov V, Kharchevnikova N, Zholdakova Z et al (2008) Directions in QSAR modeling for regulatory uses in OECD member countries, EU and in Russia. J Environ Sci Health C Environ Carcinog Ecotoxicol Rev 26(2):201–236

    Article  Google Scholar 

  9. Garabedian TE (1997) Laboratory record keeping. Nat Biotechnol 15(8):799–800

    Article  CAS  Google Scholar 

  10. Rubacha M, Rattan AK, Hosselet SC (2011) A review of electronic laboratory notebooks available in the market today. J Lab Autom 16(1):90–98

    Article  CAS  Google Scholar 

  11. Mascarelli A (2014) Research tools: jump off the page. Nature 507(7493):523–525

    Article  CAS  Google Scholar 

  12. Macmillan Publishers Limited (2016) Announcement: where are the data? Nature 537(7619):138

    Article  Google Scholar 

  13. Celi LA, Citi L, Ghassemi M, Pollard TJ (2019) The PLOS ONE collection on machine learning in health and biomedicine: towards open code and open data. PLoS ONE 14(1):e0210232

    Article  CAS  Google Scholar 

  14. Vasilevsky NA, Minnier J, Haendel MA, Champieux RE (2017) Reproducible and reusable research: are journal data sharing policies meeting the mark? PeerJ 5:e3208

    Article  Google Scholar 

  15. Greenwald NF, Bandopadhayay P, Beroukhim R (2017) Open data: spot data glitches before publication. Nature 550(7676):333

    Article  CAS  Google Scholar 

  16. Gedeck P, Skolnik S, Rodde S (2017) Developing collaborative QSAR models without sharing structures. J Chem Inf Model 57(8):1847–1858

    Article  CAS  Google Scholar 

  17. Polanski J, Bak A, Gieleciak R, Magdziarz T (2006) Modeling robust QSAR. J Chem Inf Model 46(6):2310–2318

    Article  CAS  Google Scholar 

  18. Shoombuatong W, Prathipati P, Owasirikul W, Worachartcheewan A, Simeon S, Anuwongcharoen N et al (2017) Towards the revival of interpretable QSAR models. In: Roy K (ed) Advances in QSAR modeling: applications in pharmaceutical, chemical, food, agricultural and environmental sciences. Springer International Publishing, Cham, pp 3–55. Available from: https://doi.org/10.1007/978-3-319-56850-8_1

    Google Scholar 

  19. Guha R, Willighagen E (2012) A survey of quantitative descriptions of molecular structure. Curr Top Med Chem 12(18):1946–1956

    Article  CAS  Google Scholar 

  20. Grisoni F, Consonni V, Todeschini R (2018) Impact of molecular descriptors on computational models. In: Brown JB (ed) Computational chemogenomics. Humana Press, New York, pp 171–209

    Chapter  Google Scholar 

  21. Guha R, Van Drie JH (2008) Structure–activity landscape index: identifying and quantifying activity cliffs. J Chem Inf Model 48(3):646–658

    Article  CAS  Google Scholar 

  22. Sisay MT, Peltason L, Bajorath J (2009) Structural interpretation of activity cliffs revealed by systematic analysis of structure-activity relationships in analog series. J Chem Inf Model 49(10):2179–2189

    Article  CAS  Google Scholar 

  23. Guimarães MC, Duarte MH, Silla JM, Freitas MP (2016) Is conformation a fundamental descriptor in QSAR? A case for halogenated anesthetics. Beilstein J Org Chem 12:760–768

    Article  Google Scholar 

  24. Pissurlenkar RR, Khedkar VM, Iyer RP, Coutinho EC (2011) Ensemble QSAR: a QSAR method based on conformational ensembles and metric descriptors. J Comput Chem 32(10):2204–2218

    Article  CAS  Google Scholar 

  25. Wicker JG, Cooper RI (2016) Beyond rotatable bond counts: capturing 3D conformational flexibility in a single descriptor. J Chem Inf Model 56(12):2347–2352

    Article  CAS  Google Scholar 

  26. Dearden J, Cronin M, Kaiser K (2009) How not to develop a quantitative structure-activity or structure-property relationship (QSAR/QSPR). SAR QSAR Environ Res 20(3–4):241–266

    Article  CAS  Google Scholar 

  27. Tropsha A (2010) Best practices for QSAR model development, validation, and exploitation. Mol Inform 29(6–7):476–488

    Article  CAS  Google Scholar 

  28. Tropsha A, Gramatica P, Gombar VK (2003) The importance of being earnest: validation is the absolute essential for successful application and interpretation of QSPR models. QSAR Comb Sci 22(1):69–77

    Article  CAS  Google Scholar 

  29. Golbraikh A, Muratov E, Fourches D, Tropsha A (2014) Data set modelability by QSAR. J Chem Inf Model 54(1):1–4

    Article  CAS  Google Scholar 

  30. Roy PP, Kovarich S, Gramatica P (2011) QSAR model reproducibility and applicability: a case study of rate constants of hydroxyl radical reaction models applied to polybrominated diphenyl ethers and (benzo-)triazoles. J Comput Chem 32(11):2386–2396

    Article  CAS  Google Scholar 

  31. Svensson F, Aniceto N, Norinder U, Cortes-Ciriano I, Spjuth O, Carlsson L et al (2018) Conformal regression for quantitative structure-activity relationship modeling-quantifying prediction uncertainty. J Chem Inf Model 58(5):1132–1140

    Article  CAS  Google Scholar 

  32. Bosc N, Atkinson F, Felix E, Gaulton A, Hersey A, Leach AR (2019) Large scale comparison of QSAR and conformal prediction methods and their applications in drug discovery. J Cheminform 11(1):4

    Article  Google Scholar 

  33. Lampa S, Alvarsson J, Arvidsson Mc Shane S, Berg A, Ahlberg E, Spjuth O (2018) Predicting off-target binding profiles with confidence using conformal prediction. Front Pharmacol 9:1256

    Article  CAS  Google Scholar 

  34. Spjuth O, Willighagen EL, Guha R, Eklund M, Wikberg JE (2010) Towards interoperable and reproducible QSAR analyses: exchange of datasets. J Cheminform 2:5

    Article  Google Scholar 

  35. Spjuth O, Helmus T, Willighagen EL, Kuhn S, Eklund M, Wagener J et al (2007) Bioclipse: an open source workbench for chemo- and bioinformatics. BMC Bioinfo 8:59

    Article  Google Scholar 

  36. Ruusmann V, Sild S, Maran U (2014) QSAR DataBank – an approach for the digital organization and archiving of QSAR model information. J Cheminform 6:25

    Article  Google Scholar 

  37. Ruusmann V, Sild S, Maran U (2015) QSAR DataBank repository: open and linked qualitative and quantitative structure-activity relationship models. J Cheminform 7:32

    Article  CAS  Google Scholar 

  38. Ruusmann V, Sild S, Maran U (2012) r-qsardb R package. https://code.google.com/archive/p/r-qsardb/

  39. Fourches D, Muratov E, Tropsha A (2010) Trust, but verify: on the importance of chemical structure curation in cheminformatics and QSAR modeling research. J Chem Inf Model 50(7):1189–1204

    Article  CAS  Google Scholar 

  40. Fourches D, Muratov E, Tropsha A (2016) Trust, but verify II: a practical guide to chemogenomics data curation. J Chem Inf Model 56(7):1243–1252

    Article  CAS  Google Scholar 

  41. Fourches D, Muratov E, Tropsha A (2015) Curation of chemogenomics data. Nat Chem Biol 11(8):535

    Article  CAS  Google Scholar 

  42. Landrum G (2016) Reading and writing molecules 1. https://raw.githubusercontent.com/greglandrum/rdkit-tutorials/master/notebooks/001_ReadingMolecules1.ipynb

Download references

Acknowledgement

This work is supported by the Research Career Development Grant (No. RSA6280075) from the Thailand Research Fund.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Chanin Nantasenamat .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Science+Business Media, LLC, part of Springer Nature

About this protocol

Check for updates. Verify currency and authenticity via CrossMark

Cite this protocol

Nantasenamat, C. (2020). Best Practices for Constructing Reproducible QSAR Models. In: Roy, K. (eds) Ecotoxicological QSARs. Methods in Pharmacology and Toxicology. Humana, New York, NY. https://doi.org/10.1007/978-1-0716-0150-1_3

Download citation

  • DOI: https://doi.org/10.1007/978-1-0716-0150-1_3

  • Published:

  • Publisher Name: Humana, New York, NY

  • Print ISBN: 978-1-0716-0149-5

  • Online ISBN: 978-1-0716-0150-1

  • eBook Packages: Springer Protocols

Publish with us

Policies and ethics