Journal of Molecular Modeling

, Volume 9, Issue 4, pp 235–241 | Cite as

The compressed feature matrix—a fast method for feature based substructure search

  • S. F. Badreddin Abolmaali
  • Jörg K. Wegner
  • Andreas Zell
Original Paper

Abstract

The compressed feature matrix (CFM) is a feature based molecular descriptor for the fast processing of pharmacochemical applications such as adaptive similarity search, pharmacophore development and substructure search. Depending on the particular purpose, the descriptor may be generated upon either topological or Euclidean molecular data. To assure a variable utilizability, the assignment of the structural patterns to feature types is arbitrarily determined by the user. This step is based on a graph algorithm for substructure search, which resembles the common substructure descriptors. While these merely allow a screening for the predefined patterns, the CFM permits a real substructure/subgraph search, presuming that all desired elements of the query substructure are described by the selected feature set. In this work, the CFM based substructure search is evaluated with regard to both the different outputs resulting from varying feature sets and the search speed. As a benchmark we use the programmable atom typer (PATTY) graph algorithm. When comparing the two methods, the CFM based matrix algorithm is up to several hundred times faster than PATTY and when using the CFM as a basis for substructure screening, the search speed is accelerated by three orders of magnitude. Thus, the CFM based substructure search complies with the requirements for interactive usage, even for the evaluation of several hundred thousand compounds. The concept of the CFM is implemented in the software COFEA.

Figure CFM based substructure search using the compounds dopamine and benzene-1,2-diol

Keywords

Substructure search Descriptor Features Computer chemistry Screening 

Abbreviations

CFM

compressed feature matrix

MCS

maximum common substructure

HSCS

highest scoring common substructure

SSSR

smallest set of smallest rings

ESER

essential set of essential rings

ESSR

extended set of smallest rings

GSCE

graph of smallest cycles at edges

PATTY

programmable atom typer

HTS

high throughput screening

References

  1. 1.
    Todeschini R, Consonni V (2000) Handbook of molecular descriptors. Wiley-VCH, Weinheim, p 427Google Scholar
  2. 2.
    Ihlenfeld WD, Gasteiger J (1994) J Comput Chem 15:793–813Google Scholar
  3. 3.
    Hurst T, Heritage TW (1997) HQSAR. A highly predictive QSAR technique based on molecular holograms. In: 213th ACS National Meeting, San Francisco, Calif.Google Scholar
  4. 4.
    Seel M, Turner DB, Willett P (1999) Quant Struct Act Relat 18:245–252CrossRefGoogle Scholar
  5. 5.
    Carhart RE, Smith DH, Venkataraghavan R (1985) J Chem Inf Comput Sci 25:64–73Google Scholar
  6. 6.
    Scsibrany H, Varmuza K (1992) Topological similarity of molecules based on maximum common substructures. In: Ziessow D (ed) Software development in chemistry. Proceedings of the 7th CIC Workshop "Computers in Chemistry", BerlinGoogle Scholar
  7. 7.
    Ullmann JR (1976) J Assoc Comput Mach 23:31–42CrossRefGoogle Scholar
  8. 8.
    Daylight Chemical Information Systems (2002) Daylight theory manual, http://www.daylight.com/dayhtml/doc/theory/theory.smarts.htmlGoogle Scholar
  9. 9.
    Rücker G, Rücker C (2001) J Chem Inf Comput Sci 33:1457–1462CrossRefGoogle Scholar
  10. 10.
    Figueras J (1996) J Chem Inf Comput Sci 36:986–991CrossRefGoogle Scholar
  11. 11.
    Fujita S (1988) J Chem Inf Comput Sci 28:1–9Google Scholar
  12. 12.
    Downs GM, Gillet VJ, Holliday JD, Lynch MF (1989) J Chem Inf Comput Sci 29:187–206Google Scholar
  13. 13.
    Dury L, Latour T, Leherte L, Barberis F, Vercauteren DB (2001) J Chem Inf Comput Sci 41:1437–1445CrossRefPubMedGoogle Scholar
  14. 14.
    Abolmaali SFB, Ostermann C, Zell A (2003) J Mol Model, in pressGoogle Scholar
  15. 15.
    Bush BL, Sheridan RP (1993) J Chem Inf Comput Sci 33:756–762Google Scholar
  16. 16.
    Wegner JK, Zell A (2002) JOELib—a java based computational chemistry package. 16th Molecular Modeling Workshop, DarmstadtGoogle Scholar
  17. 17.
    JOELib (2002) http://sourceforge.net/projects/joelibGoogle Scholar
  18. 18.
    Böhm M, Klebe G (2002) J Med Chem 45:1585–1597CrossRefGoogle Scholar
  19. 19.
    MDL Information Systems (2002) CTfile formats, http://www.mdli.com/downloads/literature/ctfile.pdfGoogle Scholar
  20. 20.
    Dalby A, Nourse JG, Hounshell WG, Gushurst AKI, Grier DL, Leland BA, Laufer J (1992) J Chem Inf Comput Sci 32:244–255Google Scholar
  21. 21.
    National Cancer Institute, Bethesda, Md., http://dtp.nci.nih.gov/webdata.htmlGoogle Scholar

Copyright information

© Springer-Verlag 2003

Authors and Affiliations

  • S. F. Badreddin Abolmaali
    • 1
  • Jörg K. Wegner
    • 1
  • Andreas Zell
    • 1
  1. 1.Department of Computer ScienceUniversity of TuebingenTübingenGermany

Personalised recommendations