Best Practices for Constructing Reproducible QSAR Models
Quantitative structure-activity/property relationship (QSAR/QSPR) has been instrumental in unraveling the origins of the mechanism of action for biological activity of interest by means of mathematical formulation as a function of the physicochemical description of chemical structures. Of the growing number of QSAR models being published in the literature, it is estimated that the majority of these models are not reproducible given the heterogeneity of the components of the QSAR model setup (e.g., descriptor, learning algorithm, learning parameters, open-source and commercial software, different software versions, etc.) and the limited availability of the underlying raw data and analysis source codes used to construct these models. This inherently poses a challenge for newcomers and practitioners in the field to reproduce or make use of the published QSAR models. However, this is expected to change in light of the growing momentum for open data and data sharing that are being encouraged by funders, publishers, and journals as well as driven by the nextageneration of researchers who embrace open science for pushing science forward. This chapter examines these issues and provides general guidelines and best practices for constructing reproducible QSAR models.
Key wordsQuantitative structure-activity relationship Quantitative structure-property relationship Structure-activity relationship QSAR QSPR SAR Research reproducibility Reproducibility Reproducible Jupyter Python
This work is supported by the Research Career Development Grant (No. RSA6280075) from the Thailand Research Fund.
- 1.Nantasenamat C, Isarankura-Na-Ayudhya C, Naenna T, Prachayasittikul V (2009) A practical overview of quantitative structure-activity relationship. EXCLI J 8(7):74–88Google Scholar
- 18.Shoombuatong W, Prathipati P, Owasirikul W, Worachartcheewan A, Simeon S, Anuwongcharoen N et al (2017) Towards the revival of interpretable QSAR models. In: Roy K (ed) Advances in QSAR modeling: applications in pharmaceutical, chemical, food, agricultural and environmental sciences. Springer International Publishing, Cham, pp 3–55. Available from: https://doi.org/10.1007/978-3-319-56850-8_1Google Scholar
- 38.Ruusmann V, Sild S, Maran U (2012) r-qsardb R package. https://code.google.com/archive/p/r-qsardb/
- 42.Landrum G (2016) Reading and writing molecules 1. https://raw.githubusercontent.com/greglandrum/rdkit-tutorials/master/notebooks/001_ReadingMolecules1.ipynb