Journal of Computer-Aided Molecular Design

, Volume 21, Issue 5, pp 269–280 | Cite as

Support vector inductive logic programming outperforms the naive Bayes classifier and inductive logic programming for the classification of bioactive chemical compounds

  • Edward O. Cannon
  • Ata Amini
  • Andreas Bender
  • Michael J. E. Sternberg
  • Stephen H. Muggleton
  • Robert C. Glen
  • John B. O. Mitchell
Article

Abstract

We investigate the classification performance of circular fingerprints in combination with the Naive Bayes Classifier (MP2D), Inductive Logic Programming (ILP) and Support Vector Inductive Logic Programming (SVILP) on a standard molecular benchmark dataset comprising 11 activity classes and about 102,000 structures. The Naive Bayes Classifier treats features independently while ILP combines structural fragments, and then creates new features with higher predictive power. SVILP is a very recently presented method which adds a support vector machine after common ILP procedures. The performance of the methods is evaluated via a number of statistical measures, namely recall, specificity, precision, F-measure, Matthews Correlation Coefficient, area under the Receiver Operating Characteristic (ROC) curve and enrichment factor (EF). According to the F-measure, which takes both recall and precision into account, SVILP is for seven out of the 11 classes the superior method. The results show that the Bayes Classifier gives the best recall performance for eight of the 11 targets, but has a much lower precision, specificity and F-measure. The SVILP model on the other hand has the highest recall for only three of the 11 classes, but generally far superior specificity and precision. To evaluate the statistical significance of the SVILP superiority, we employ McNemar’s test which shows that SVILP performs significantly (p  <  5%) better than both other methods for six out of 11 activity classes, while being superior with less significance for three of the remaining classes. While previously the Bayes Classifier was shown to perform very well in molecular classification studies, these results suggest that SVILP is able to extract additional knowledge from the data, thus improving classification results further.

Keywords

Classification Feature selection Machine learning Molecular similarity Screening 

Supplementary material

10822_2007_9113_MOESM1_ESM.pdf (27 kb)
ESM1 (PDF 28 kb)

References

  1. 1.
    Johnson AM, Maggiora GM (1990) Concepts and applications of molecular similarity, eds. Wiley, New YorkGoogle Scholar
  2. 2.
    Bender A, Jenkins JL, Li Q, Adams SE, Cannon EO, Glen RC (2006) Molecular similarity: advances in methods, applications and validations in virtual screening and QSAR. In: Annual reports in computational chemistry, vol 2, pp 141–168Google Scholar
  3. 3.
    Patterson DE, Cramer RD, Ferguson AM, Clark RD, Weinberger LE (1996) J Med Chem 39:3049CrossRefGoogle Scholar
  4. 4.
    Bohm HJ, Schneider G (2000) Virtual screening for bioactive molecules ed. Wiley-VCHGoogle Scholar
  5. 5.
    Downs GM, Willett P, Fisanick W (1994) J Chem Inf Comput Sci 34:1094CrossRefGoogle Scholar
  6. 6.
    Estrada E, Uriarte E (2001) Curr Med Chem 8:1573Google Scholar
  7. 7.
    Mason JS, Good AC, Martin EJ (2001) Curr Pharm Des 7:567CrossRefGoogle Scholar
  8. 8.
    Leach AR, Gillet VJ (2003) An introduction to chemoinformatics. Kluwer, DordrechtGoogle Scholar
  9. 9.
    Gasteiger J (2003) Handbook of chemoinformatics, eds. Wiley-VCH, WeinheimGoogle Scholar
  10. 10.
    Scitegic Inc. Retrieved from http://www.scitegic.com/Google Scholar
  11. 11.
    Hert J, Willett P, Wilton DJ, Acklin P, Azzaoui K, Jacoby E, Schuffenhauer A (2004) Org Biomol Chem 2:3256CrossRefGoogle Scholar
  12. 12.
    Elsevier MDL, 2440 Camino Ramon, Suite 300, San Ramon, CA 94583, USA. http://www.mdl.com/Google Scholar
  13. 13.
    Glen RC, Bender A, Arnby CH, Carlsson L, Boyer S, Smith J (2006) IDrugs 9:199Google Scholar
  14. 14.
    Bender A, Mussa HY, Glen RC, Reiling S (2004) J Chem Inf Comput Sci 44:170CrossRefGoogle Scholar
  15. 15.
    Mitchell TM (1997) Machine learning, ed. McGraw-Hill, New YorkGoogle Scholar
  16. 16.
    Liu YA (2004) J Chem Inf Comput Sci 44:1823CrossRefGoogle Scholar
  17. 17.
    Muggleton SH, Lodhi H, Amini A, Sternberg MJE (2006) In: Holmes D, Jain LC (eds) Innovations in machine learning. Springer-Verlag, pp 113–135Google Scholar
  18. 18.
    Muggleton SH, Lodhi H, Amini A, Sternberg MJE (2005) Proceedings of the 8th international conference on discovery science. Springer-Verlag, 3735:163Google Scholar
  19. 19.
    Briem H, Lessel UF (2000) Persepect Drug Discovery Design 20:231CrossRefGoogle Scholar
  20. 20.
    Bender A, Mussa HY, Glen RC, Reiling S (2004) J Chem Inf Comput Sci 44:1708CrossRefGoogle Scholar
  21. 21.
    Hert J, Willett P, Wilton DJ, Acklin P, Azzaoui K, Jacoby E, Schuffenhauer A (2004) J Chem Inf Comput Sci 44:1177CrossRefGoogle Scholar
  22. 22.
    Cannon EO, Bender A, Palmer DS, Mitchell JBO (2006) J Chem Inf Model 46:2369CrossRefGoogle Scholar
  23. 23.
    World Anti-Doping Agency (WADA), Stock Exchange Tower, 800 Place Victoria, (Suite 1700), P.O. Box 120, Montreal, Quebec, H4Z 1B7, Canada. Retrieved from http://www.wada.ama.orgGoogle Scholar
  24. 24.
    Rodgers S, Glen RC, Bender A (2006) J Chem Inf Model 46:569CrossRefGoogle Scholar
  25. 25.
    King RD, Muggleton SH, Lewis R, Sternberg MJE (1992) Proc Natl Acad Sci 89:11322CrossRefGoogle Scholar
  26. 26.
    King RD, Muggleton SH, Srinivasan A, Sternberg MJE (1996) Proc Natl Acad Sci 93:438CrossRefGoogle Scholar
  27. 27.
    Buttingsrud B, Ryeng E, King RD, Alsberg BK (2006) J Comput Aid Mol Des 20:361CrossRefGoogle Scholar
  28. 28.
    Pompe U, Kononenko I (1995) Proceedings of the 5th international workshop on inductive logic programming, pp 417–436Google Scholar
  29. 29.
    Dutra I, Page D, Santos Costa V, Shavlik J (2003) In: Matwin S, Sammut C (eds) Proceedings of the 12th international conference on inductive logic programming, vol 2583. Lecture Notes in Computer Science, Springer-Verlag, pp 48–65Google Scholar
  30. 30.
    Hoche S, Wrobel S (2001) In: Rouveirol C, Sebag M (eds) Proceedings of the 11th interational conference on inductive logic programming, vol 2157. Lecture Notes In Computer Science, Springer-Verlag, pp 51–64Google Scholar
  31. 31.
    Bender A, Glen RC (2004) Org Biomol Chem 2:3204CrossRefGoogle Scholar
  32. 32.
    Barrett SJ, Langdon WB (2006) In: Tiwari A, Knowles J (eds) Applications of soft computing: recent trends, vol 19. Springer-Verlag, pp 99–110Google Scholar
  33. 33.
    Guha R, Howard MT, Hutchison GR, Murray-Rust P, Rzepa H, Steinbeck C, Wegner J, Willighagen EL (2006) J Chem Inf Model 46(3):991. The Open Babel Package (2006), version 2.0.1. Retrieved from http://openbabel.sourceforge.net/Google Scholar
  34. 34.
    Quinlan JR (1986) Mach Learn 1:81Google Scholar
  35. 35.
    A-Razzak M, Glen RC (1992) J Comput Aided Mol Des 6:349CrossRefGoogle Scholar
  36. 36.
    Muggleton SH (1995) New Generation Comput 13:245CrossRefGoogle Scholar
  37. 37.
    Muggleton SH, Bryant CH (2000) In: Cussens J, Frisch AM (eds) Proceedings of the 10th international conference on inductive logic programming. Springer-Verlag, pp 130–146Google Scholar
  38. 38.
    Joachims T (1999) Making large-Scale SVM learing practical. In: Schölkopf B, Burges C, Smola A (eds) Advances in Kernel Methods-Support Vector Learing, MIT-press, http://svmlight.joachims.orgGoogle Scholar
  39. 39.
    Siegel S, Castellan NJ Jr (1988) Nonparametric statistics for the behavioral sciences. Boston, MA, McGraw-HillGoogle Scholar
  40. 40.
    McNemar Q (1947) Psychometrica 12:153CrossRefGoogle Scholar
  41. 41.
    Bender A, Glen RC (2005) J Chem Inf Model 45:1369CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC 2007

Authors and Affiliations

  • Edward O. Cannon
    • 1
  • Ata Amini
    • 1
    • 2
  • Andreas Bender
    • 1
    • 3
  • Michael J. E. Sternberg
    • 1
    • 2
  • Stephen H. Muggleton
    • 1
    • 2
  • Robert C. Glen
    • 1
  • John B. O. Mitchell
    • 1
  1. 1.Unilever Centre for Molecular Science Informatics, Department of ChemistryUniversity of CambridgeCambridgeUK
  2. 2.Structural Bioinformatics, Division of Molecular Biosciences, Faculty of Natural SciencesImperial CollegeLondonUK
  3. 3.Novartis Institutes for Biomedical ResearchLead Discovery InformaticsCambridgeUSA

Personalised recommendations