Applicability Domain: A Step Toward Confident Predictions and Decidability for QSAR Modeling

  • Supratik Kar
  • Kunal RoyEmail author
  • Jerzy Leszczynski
Part of the Methods in Molecular Biology book series (MIMB, volume 1800)


In the context of human safety assessment through quantitative structure–activity relationship (QSAR) modeling, the concept of applicability domain (AD) has an enormous role to play. The Organization of Economic Co-operation and Development (OECD) for QSAR model validation recommended as principle 3 “A defined domain of applicability” to be present for a predictive QSAR model. The study of AD allows estimating the uncertainty in the prediction for a particular molecule based on how similar it is to the training compounds which are used in the model development. In the current scenario, AD represents an active research topic, and many methods have been designed to estimate the competence of a model and the confidence in its outcome for a given prediction task. Thus, characterization of interpolation space is significant in defining the AD. The diverse set of reported AD methods was constructed through different hypotheses and algorithms. These multiplicities of methodologies mystify the end users and make the comparison of the AD for different models a complex issue to address. We have attempted to summarize in this chapter the important concepts of AD including particulars of the available methods to compute the AD along with their thresholds and criteria for estimating AD through training set interpolation in the descriptor space. The idea about transparent domain and decision domain are also discussed. To help readers determine the AD in their projects, practical examples together with available open source software tools are provided.

Key words

Applicability domain Confidence In silico QSAR Reliability 



S.K. and J.L. thank the National Science Foundation (NSF/CREST HRD-1547754, and NSF/RISE HRD-1547836) for financial support. K.R. is thankful to the UGC, New Delhi for financial assistance under the UPE II scheme.


  1. 1.
    Roy K, Kar S, Das RN (2015) Understanding the basics of QSAR for applications in pharmaceutical sciences and risk assessment. Academic Press, San Diego, CA, USAGoogle Scholar
  2. 2.
    Roy K, Kar S (2015) Importance of applicability domain of QSAR models. In: Roy K (ed) Quantitative structure-activity relationships in drug design, predictive toxicology, and risk assessment. IGI Global, Hershey PA, USA, pp 180–211CrossRefGoogle Scholar
  3. 3.
    Gadaleta D, Mangiatordi GF, Catto M, Carotti A, Nicolotti O (2016) Applicability domain for QSAR models: where theory meets reality. Int J Quant Struct Prop Relat J 1:45–63Google Scholar
  4. 4.
    Mathea M, Klingspohn W, Baumann K (2016) Chemoinformatic classification methods and their applicability domain. Mol Inform 35:160–180CrossRefPubMedGoogle Scholar
  5. 5.
    Wold S, Sjostrom M, Eriksson L (2001) PLS-regression: a basic tool of chemometrics. Chemom Intell Lab Syst 58:109–130CrossRefGoogle Scholar
  6. 6.
    Netzeva TI, Worth AP, Aldenberg T, Benigni R, Cronin MTD, Gramatica P et al (2005) Current status of methods for defining the applicability domain of (quantitative) structure-activity relationships. Altern Lab Anim 33:155–173PubMedGoogle Scholar
  7. 7.
    Golbraikh A, Tropsha A (2002) Beware of q2! J Mol Graph Model 20:269–276CrossRefPubMedGoogle Scholar
  8. 8.
    OECD, Principles for the validation of (Q)SARs (2004). (Accessed 20 May, 2017)
  9. 9.
    Jaworska JS, Comber M, Auer C, Van Leeuwen CJ (2003) Summary of a workshop on regulatory acceptance of (Q)SARs for human health and environmental endpoints. Environ Health Perspect 111:1358–1360CrossRefPubMedPubMedCentralGoogle Scholar
  10. 10.
    Gramatica P (2007) Principles of QSAR models validation: internal and external. QSAR Comb Sci 26:694–701CrossRefGoogle Scholar
  11. 11.
    Weaver S, Paul Gleeson M (2008) The importance of the domain of applicability in QSAR modeling. J Mol Graph Model 26:1315–1326CrossRefPubMedGoogle Scholar
  12. 12.
    Roy K, Kar S, Das RN (2015) A primer on QSAR/QSPR modeling: fundamental concepts (SpringerBriefs in Molecular Science). Springer, BerlinCrossRefGoogle Scholar
  13. 13.
    Roy K, Kar S (2015) How to judge predictive quality of classification and regression based QSAR models? In: Haq ZU, Madura J (eds) Frontiers of computational chemistry. Bentham, Sharjah, pp 71–120Google Scholar
  14. 14.
    Hanser T, Barber C, Marchaland JF, Werner S (2016) Applicability domain: towards a more formal definition. SAR QSAR Environ Res 27:865–881CrossRefGoogle Scholar
  15. 15.
    Jaworska J, Nikolova-Jeliazkova N, Aldenberg T (2005) QSAR applicability domain estimation by projection of the training set descriptor space: a review. Altern Lab Anim 33:445–459PubMedPubMedCentralGoogle Scholar
  16. 16.
    Stanforth RW, Kolossov E, Mirkin B (2007) A measure of domain of applicability for QSAR modeling based on intelligent K-means clustering. QSAR Comb Sci 26:837–844CrossRefGoogle Scholar
  17. 17.
    Guha R, Jurs PC (2005) Determining the validity of a QSAR model-a classification approach. J Chem Inf Model 45:65–73CrossRefPubMedPubMedCentralGoogle Scholar
  18. 18.
    Nikolova-Jeliazkova N, Jaworska J (2005) An approach to determining applicability domain for QSAR group contribution models: an analysis of SRC KOWWIN. Altern Lab Anim 33:461–470PubMedPubMedCentralGoogle Scholar
  19. 19.
    Worth AP, Bassan A, Gallegos A, Netzeva TI, Patlewicz G, Pavan M et al (2005) The characterisation of (quantitative) structure-activity relationships: preliminary guidance. ECB Report EUR 21866 EN, European Commission, Joint Research Centre; Ispra, Italy, pp. 95Google Scholar
  20. 20.
    Topkat OPS (2000). U.S. Patent 6, 036, 349Google Scholar
  21. 21.
    Preparata FP, Shamos MI (1991) In: Preparata FP, Shamos MI (eds) Computational geometry: an introduction. Springer-Verlag, New YorkGoogle Scholar
  22. 22.
    Jaworska JS, Nikolova-Jeliazkova N, Aldenberg T (2004) Review of methods for applicability domain estimation. Report, The European Commission-Joint Research Centre, Ispra, ItalyGoogle Scholar
  23. 23.
    Hair JF Jr, Anderson RE, Tatham RL, Black WC (2005) Multivariate data analysis. Pearson Education, SingaporeGoogle Scholar
  24. 24.
    Sheridan R, Feuston RP, Maiorov VN, Kearsley S (2004) Similarity to molecules in the training set is a good discriminator for prediction accuracy in QSAR. J Chem Inform Comput Sci 44:1912–1928CrossRefGoogle Scholar
  25. 25.
    SIMCA-P 10.0. (2002), UMETRICS, Umea, Sweden,
  26. 26.
    Tetko IV, Sushko I, Pandey AK, Zhu H, Tropsha A, Papa E et al (2008) Critical assessment of QSAR models of environmental toxicity against Tetrahymena pyriformis: focusing on applicability domain and overfitting by variable selection. J Chem Inform Comput Sci 48:1733–1746CrossRefGoogle Scholar
  27. 27.
    Manallack DT, Tehan BG, Gancia E, Hudson BD, Ford MG, Livingstone DJ et al (2003) A consensus neural network-based technique for discriminating soluble and poorly soluble compounds. J Chem Inform Comput Sci 43:674–679CrossRefGoogle Scholar
  28. 28.
    Tetko IV (2008) Associative neural network. Methods Mol Biol 458:185–202PubMedGoogle Scholar
  29. 29.
    Tetko IV, Tanchuk VY (2002) Application of associative neural networks for prediction of lipophilicity in ALOGPS 2.1 program. J Chem Inform Comput Sci 42:1136–1145CrossRefGoogle Scholar
  30. 30.
    Chen JJ, Tsai CA, Young JF, Kodell RL (2005) Classification ensembles for unbalanced class sizes in predictive toxicology. SAR QSAR Environ Res 16:517–529CrossRefPubMedGoogle Scholar
  31. 31.
    Jouan-Rimbaud D, Bouveresse E, Massart DL, de Noord OE (1999) Detection of prediction outliers and inliers in multivariate calibration. AnalyticaChimicaActa 388:283–301Google Scholar
  32. 32.
    Roy K, Kar S, Ambure P (2015) On a simple approach for determining applicability domain of QSAR models. Chemom Intell Lab Syst 145:22–29CrossRefGoogle Scholar
  33. 33.
    Dimitrov S, Dimitrova G, Pavlov T, Dimitrova N, Patlewicz G, Niemela J et al (2005) Stepwise approach for defining the applicability domain of SAR and QSAR models. J Chem Inform Model 45:839–849CrossRefGoogle Scholar
  34. 34.
    Tong W, Hong H, Fang H, Xie Q, Perkins R (2003) Decision forest: combining the predictions of multiple independent decision tree models. J Chem Inform Comput Sci 43:525–531CrossRefGoogle Scholar
  35. 35.
    Tong W, Hong H, Xie Q, Xie L, Fang H, Perkins R (2004) Assessing QSAR limitations–a regulatory perspective. Curr Comput Aided Drug Des 1:195–205CrossRefGoogle Scholar
  36. 36.
    Fechner N, Jahn A, Hinselmann G, Zell A (2009) Atomic local neighborhood flexibility incorporation into a structured similarity measure for QSAR. J Chem Inform Model 49:549–560CrossRefGoogle Scholar
  37. 37.
    Mirkin B (2005) Clustering for data mining: a data recovery approach. Chapman & Hall/CRC, LondonCrossRefGoogle Scholar
  38. 38.
    Smellie A (2004) Accelerated K-means clustering in metric spaces. J Chem Inform Comput Sci 44:1929–1935CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2018

Authors and Affiliations

  1. 1.Interdisciplinary Center for Nanotoxicity, Department of Chemistry and BiochemistryJackson State UniversityJacksonUSA
  2. 2.Drug Theoretics and Cheminformatics Laboratory, Department of Pharmaceutical TechnologyJadavpur UniversityKolkataIndia

Personalised recommendations