Assessing the Effect of 2D Fingerprint Filtering on ILP-Based Structure-Activity Relationships Toxicity Studies in Drug Design

  • Rui Camacho
  • Max Pereira
  • Vítor Santos Costa
  • Nuno A. Fonseca
  • Carlos J. V. Simões
  • Rui M. M. Brito
Part of the Advances in Intelligent and Soft Computing book series (AINSC, volume 93)


The rational development of new drugs is a complex and expensive process. A myriad of factors affect the activity of putative candidate molecules in vivo and the propensity for causing adverse and toxic effects is recognised as the major hurdle behind the current “target-rich, lead-poor” scenario.

Structure-Activity Relationship studies, using relationalMachine Learning algorithms, proved already to be very useful in the complex process of rational drug design. However, a typical problem with those studies concerns the use of available repositories of previously studied molecules. It is quite often the case that those repositories are highly biased since they contain lots of molecules that are similar to each other. This results from the common practice where an expert chemist starts off with a lead molecule, presumed to have some potential, and then introduces small modifications to produce a set of similar molecules. Thus, the resulting sets have a kind of similarity bias.

In this paper we assess the advantages of filtering out similar molecules in order to improve the application of relational learners in Structure-Activity Relationship (SAR) problems to predict toxicity. Furthermore, we also assess the advantage of using a relational learner to construct comprehensible models that may be quite valuable to bring insights into the workings of toxicity.


Hydrogen Bond Donor Inductive Logic Programming Relational Learner Similar Molecule Toxic Molecule 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Plewczynski, D.: Tvscreen: Trend vector virtual screening of large commercial compounds collections. In: BIOTECHNO 2008, pp. 59–63 (2008)Google Scholar
  2. 2.
    Graham, J., Page, C., Kamal, A.: Accelerating the drug design process through parallel inductive logic programming data mining. In: CSB 2003, p. 400 (2003)Google Scholar
  3. 3.
    van de Waterbeemd, H., Gifford, E.: Admet in silico modelling: towards prediction paradise? Nat. Rev. Drug. Discov. 2(3), 192–204 (2003)CrossRefGoogle Scholar
  4. 4.
    Amini, A., Muggleton, S., Lodhi, H., Sternberg, M.: A novel logic-based approach for quantitative toxicology prediction. J. Chem. Inf. Model. 47(3), 998–1006 (2007)CrossRefGoogle Scholar
  5. 5.
    Dearden, J.: In silico prediction of drug toxicity. Journal of Computer-Aided Molecular Design 17(2-4), 119–127 (2003)CrossRefGoogle Scholar
  6. 6.
    White, A., Mueller, R., Gallavan, R., Aaron, S., Wilson, A.: A multiple in silico program approach for the prediction of mutagenicity from chemical structure. Mutation Research/Genetic Toxicology and Env. Mutagenesis 539, 77–89 (2003)CrossRefGoogle Scholar
  7. 7.
    Tiwari, A., Knowles, J., Avineri, E., Dahal, K., Roy, R. (eds.): Advances in the Application of Machine Learning Techniques in Drug Discovery, Design and Development. Applications of Soft Compt.: Recent Trends. Advances in Soft Compt. Springer, Heidelberg (2006)Google Scholar
  8. 8.
    Kazius, J., Mcguire, R., Bursi, R.: Derivation and validation of toxicophores for mutagenicity prediction. J. Med. Chem. 48(1), 312–320 (2005)CrossRefGoogle Scholar
  9. 9.
    Neagu, D., Craciun, M., Stroia, S., Bumbaru, S.: Hybrid intelligent systems for predictive toxicology - a distributed approach. In: International Conference on Intelligent Systems Design and Applications, pp. 26–31 (2005)Google Scholar
  10. 10.
    Fink, T., Bruggesser, H., Reymond, J.L.: Virtual exploration of the small-molecule chemical universe below 160 daltons. Angew Chem. Int. Ed. Engl. 44(10), 1504–1508 (2005)CrossRefGoogle Scholar
  11. 11.
    Kumar, S., Dudley, J., Nei, M., Tamura, K.: Mega: A biologist-centric software for evolutionary analysis of dna and protein sequences. Briefings in Bioinf. 9, 299–306 (2008)CrossRefGoogle Scholar
  12. 12.
    Richard, A., Williams, C.: Distributed structure-searchable toxicity (dsstox) public database network: a proposal. Mutation Research/Fundamental and Molecular Mechanisms of Mutagenesis 499(26), 27–52 (2002)CrossRefGoogle Scholar
  13. 13.
    Guha, R., Howard, M., Hutchison, G., Murray-Rust, P., Rzepa, H., Steinbeck, C., Wegner, J., Willighagen, E.: The blue obelisk – interoperability in chemical informatics. J. Chem. Inf. Model. 3(46), 991–998 (2006)CrossRefGoogle Scholar
  14. 14.
    Srinivasan, A.: The Aleph Manual (2003),
  15. 15.
    Pereira, M., Costa, V.S., Camacho, R., Fonseca, N.A., Simoes, C., Brito, R.: Comparative study of classification algorithms using molecular descriptors in toxicological databases. In: Brasilian Symposium on Bioinformatics (2009)Google Scholar
  16. 16.
    Witten, I.H., Frank, E.: Data Mining: Practical machine learning tools and techniques, 2nd edn. Morgan Kaufmann, San Francisco (2005)zbMATHGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  • Rui Camacho
    • 1
  • Max Pereira
    • 1
  • Vítor Santos Costa
    • 2
  • Nuno A. Fonseca
    • 2
  • Carlos J. V. Simões
    • 3
  • Rui M. M. Brito
    • 3
  1. 1.LIAAD-INESC Porto LA & DEI, FEUPUniversidade do PortoPortugal
  2. 2.CRACS-INESC Porto LA & DCC/FCUPUniversidade do PortoPortugal
  3. 3.Chemistry Department, Faculty of Science and Technology, Center for Neuroscience and Cell BiologyPortugal

Personalised recommendations