Molecular Fragment Mining for Drug Discovery

  • Christian Borgelt
  • Michael R. Berthold
  • David E. Patterson
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3571)


The main task of drug discovery is to find novel bioactive molecules, i.e., chemical compounds that, for example, protect human cells against a virus. One way to support solving this task is to analyze a database of known and tested molecules in order to find structural properties of molecules that determine whether a molecule will be active or inactive, so that future chemical tests can be focused on the most promising candidates. A promising approach to this task was presented in [2]: an algorithm for finding molecular fragments that discriminate between active and inactive molecules. In this paper we review this approach as well as two extensions: a special treatment of rings and a method to find fragments with wildcards based on chemical expert knowledge.


Sulfur Atom Search Tree Ring Extension Minimum Support Threshold Frequent Subgraph 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Agrawal, R., Imielienski, T., Swami, A.: Mining Association Rules between Sets of Items in Large Databases. In: Proc. Conf. on Management of Data, pp. 207–216. ACM Press, New York (1993)Google Scholar
  2. 2.
    Borgelt, C., Berthold, M.R.: Mining Molecular Fragments: Finding Relevant Substructures of Molecules. In: Proc. IEEE Int. Conf. on Data Mining (ICDM 2002), Maebashi, Japan, pp. 51–58. IEEE Press, Piscataway (2002)CrossRefGoogle Scholar
  3. 3.
    Hofer, H., Borgelt, C., Berthold, M.R.: Large Scale Mining of Molecular Fragments with Wildcards. In: Proc. 5th Int. Symposium on Intelligent Data Analysis (IDA2003), Berlin, Germany, pp. 376–385. Springer, Heidelberg (2003)Google Scholar
  4. 4.
    Kramer, S., de Raedt, L., Helma, C.: Molecular Feature Mining in HIV Data. In: Proc. 7th ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining (KDD 2001), San Francisco, CA, pp. 136–143. ACM Press, New York (2001)CrossRefGoogle Scholar
  5. 5.
    Kuramochi, M., Karypis, G.: An Efficient Algorithm for Discovering Frequent Subgraphs. Technical Report TR 02-026, Dept. of Computer Science/Army HPC Research Center, University of Minnesota, Minneapolis, USA (2002)Google Scholar
  6. 6.
    Meinl, T., Borgelt, C., Berthold, M.R.: Mining Fragments with Fuzzy Chains in Molecular Databases. In: Proc. 2nd Int. Workshop on Mining Graphs, Trees and Sequences (MGTS 2004), pp.49–60. University of Pisa, Pisa (2004)Google Scholar
  7. 7.
    Weislow, O., Kiser, R., Fine, D., Bader, J., Shoemaker, R., Boyd, M.: New Soluble Formazan Assay for HIV-1 Cytopathic Effects: Application to High Flux Screening of Synthetic and Natural Products for AIDS Antiviral Activity. Journal of the National Cancer Institute 81, 577–586 (1989)CrossRefGoogle Scholar
  8. 8.
    Zaki, M., Parthasarathy, S., Ogihara, M., Li, W.: New Algorithms for Fast Discovery of Association Rules. In: Proc. 3rd Int. Conf. on Knowledge Discovery and Data Mining (KDD 1997), pp. 283–296. AAAI Press, Menlo Park (1997)Google Scholar
  9. 9.

Copyright information

© Springer-Verlag Berlin Heidelberg 2005

Authors and Affiliations

  • Christian Borgelt
    • 1
  • Michael R. Berthold
    • 2
  • David E. Patterson
    • 3
  1. 1.School of Computer ScienceOtto-von-Guericke-University of MagdeburgMagdeburgGermany
  2. 2.Department of Computer ScienceUniversity of KonstanzKonstanzGermany
  3. 3.Tripos Inc.St LouisUSA

Personalised recommendations