Segregating Confident Predictions of Chemicals’ Properties for Virtual Screening of Drugs

  • Axel J. Soto
  • Ignacio Ponzoni
  • Gustavo E. Vazquez
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5518)


In this paper we present a methodology for evaluating the confidence in the prediction of a physicochemical or biological property. Identifying unreliable compounds’ predictions is crucial for the modern drug discovery process.This task is accomplished by the combination of the method of prediction with a self-organizing map. In this way, the method is able to segregate unconfident predictions as well as confident predictions. We applied the method to four different data sets, and we obtained significant differences in the average predictions of our segregation. This approach constitutes a novel way for evaluating confidence, since it not only looks for extrapolation situations but also it identifies interpolation problems.


Drug Discovery Applicability Domain Unsupervised Learning Supervised Learning 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Augen, J.: The Evolving Role of Information Technology in the Drug Discovery Process. Drug Discov. Today 7, 315–323 (2002)CrossRefGoogle Scholar
  2. 2.
    Barril, X., Hubbard, R.E., Morley, S.D.: Virtual Screening in Structure-Based Drug Discovery. Mini Rev. Med. Chem. 4, 779–791 (2004)Google Scholar
  3. 3.
    van de Waterbeemd, H., Gifford, E.: ADMET in Silico Modelling: Towards Prediction Paradise? Nat. Rev. Drug Discov. 2, 192–204 (2003)CrossRefGoogle Scholar
  4. 4.
    Todeschini, R., Consonni, V.: Handbook of Molecular Descriptors. Wiley–VCH, Weinheim (2000)CrossRefGoogle Scholar
  5. 5.
    Yap, C.W., Li, H., Ji, Z.L., Chen, Y.Z.: Regression Methods for Developing QSAR and QSPR Models to Predict Compounds of Specific Pharmacodynamic, Pharmacokinetic and Toxicological Properties. Mini Rev. Med. Chem. 7, 1097–1107 (2007)CrossRefGoogle Scholar
  6. 6.
    Tetko, I.V., Bruneau, P., Mewes, H.-W., Rohrer, D., Poda, G.: Can We Estimate the Accuracy of ADME-Tox Predictions? Drug Discov. Today 11, 700–707 (2006)CrossRefGoogle Scholar
  7. 7.
    Jónsdóttir, S.Ó., Jørgensen, F.S., Brunak, S.: Prediction Methods and Databases within Chemoinformatics: Emphasis on Drugs and Drug Candidates. Bioinformatics 21, 2145–2160 (2005)CrossRefGoogle Scholar
  8. 8.
    Jaworska, J., Nikolova-Jeliazkova, N., Aldenberg, T.: QSAR Applicabilty Domain Estimation by Projection of the Training Set Descriptor Space: A Review. Altern. Lab. Anim. 33, 445–459 (2005)Google Scholar
  9. 9.
    Konovalov, D.A., Sim, N., Deconinck, E., Heyden, Y.V., Coomans, D.: Statistical Confidence for Variable Selection in QSAR Models Via Monte Carlo Cross-Validation. J. Chem. Inf. Model 48, 370–383 (2008)CrossRefGoogle Scholar
  10. 10.
    Yaffe, D., Cohen, Y., Espinosa, G., Arenas, A., Giralt, F.: Fuzzy ARTMAP and Back-Propagation Neural Networks Based Quantitative Structure - Property Relationships (QSPRs) for Octanol-Water Partition Coefficient of Organic Compounds. J. Chem. Inf. Comput. Sci. 42, 162–183 (2002)CrossRefGoogle Scholar
  11. 11.
    The Physical Properties Database (PHYSPROP) is marketed by Syracuse Research Corporation (SRC),
  12. 12.
    Soto, A.J., Cecchini, R.L., Vazquez, G.E., Ponzoni, I.: A Wrapper-Based Feature Selection Method for ADMET Prediction Using Evolutionary Computing. In: Marchiori, E., Moore, J.H. (eds.) EvoBIO 2008. LNCS, vol. 4973, pp. 188–199. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  13. 13.
    Kohonen, T.: Self-Organizing Maps, vol. II. Springer, Heidelberg (1997)CrossRefMATHGoogle Scholar
  14. 14.
    Winkler, D.A.: Neural Networks in ADME and Toxicity Prediction. Drug. Future 29, 1043–1057 (2004)CrossRefGoogle Scholar
  15. 15.
    De Maesschalck, R., Jouan-Rimbaud, D., Massart, D.L.: The Mahalanobis Distance. Chemometr. Intell. Lab. Syst. 50, 1–18 (2002)CrossRefGoogle Scholar
  16. 16.
    Qin, S.J.: Statistical Process Monitoring: Basics and Beyond. J. Chemometr. 17, 480–502 (2003)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2009

Authors and Affiliations

  • Axel J. Soto
    • 1
    • 2
  • Ignacio Ponzoni
    • 1
    • 2
  • Gustavo E. Vazquez
    • 1
  1. 1.Laboratorio de Investigación y Desarrollo en Computación Científica (LIDeCC), Departamento de Ciencias e Ingeniería de la Computación (DCIC)Universidad Nacional del Sur, Bahía BlancaArgentina
  2. 2.Planta Piloto de Ingeniería Química (PLAPIQUI), UNS - CONICET, Bahía BlancaArgentina

Personalised recommendations