Journal of Molecular Modeling

, Volume 9, Issue 4, pp 235–241 | Cite as

The compressed feature matrix—a fast method for feature based substructure search

  • S. F. Badreddin AbolmaaliEmail author
  • Jörg K. Wegner
  • Andreas Zell
Original Paper


The compressed feature matrix (CFM) is a feature based molecular descriptor for the fast processing of pharmacochemical applications such as adaptive similarity search, pharmacophore development and substructure search. Depending on the particular purpose, the descriptor may be generated upon either topological or Euclidean molecular data. To assure a variable utilizability, the assignment of the structural patterns to feature types is arbitrarily determined by the user. This step is based on a graph algorithm for substructure search, which resembles the common substructure descriptors. While these merely allow a screening for the predefined patterns, the CFM permits a real substructure/subgraph search, presuming that all desired elements of the query substructure are described by the selected feature set. In this work, the CFM based substructure search is evaluated with regard to both the different outputs resulting from varying feature sets and the search speed. As a benchmark we use the programmable atom typer (PATTY) graph algorithm. When comparing the two methods, the CFM based matrix algorithm is up to several hundred times faster than PATTY and when using the CFM as a basis for substructure screening, the search speed is accelerated by three orders of magnitude. Thus, the CFM based substructure search complies with the requirements for interactive usage, even for the evaluation of several hundred thousand compounds. The concept of the CFM is implemented in the software COFEA.

Figure CFM based substructure search using the compounds dopamine and benzene-1,2-diol


Substructure search Descriptor Features Computer chemistry Screening 



compressed feature matrix


maximum common substructure


highest scoring common substructure


smallest set of smallest rings


essential set of essential rings


extended set of smallest rings


graph of smallest cycles at edges


programmable atom typer


high throughput screening



This work was realized within the scope of the SOL project (Search and Optimization of Lead structures) which is supported by the German Federal Ministry of Education and Research, bmb+f under contract number 311681.


  1. 1.
    Todeschini R, Consonni V (2000) Handbook of molecular descriptors. Wiley-VCH, Weinheim, p 427Google Scholar
  2. 2.
    Ihlenfeld WD, Gasteiger J (1994) J Comput Chem 15:793–813Google Scholar
  3. 3.
    Hurst T, Heritage TW (1997) HQSAR. A highly predictive QSAR technique based on molecular holograms. In: 213th ACS National Meeting, San Francisco, Calif.Google Scholar
  4. 4.
    Seel M, Turner DB, Willett P (1999) Quant Struct Act Relat 18:245–252CrossRefGoogle Scholar
  5. 5.
    Carhart RE, Smith DH, Venkataraghavan R (1985) J Chem Inf Comput Sci 25:64–73Google Scholar
  6. 6.
    Scsibrany H, Varmuza K (1992) Topological similarity of molecules based on maximum common substructures. In: Ziessow D (ed) Software development in chemistry. Proceedings of the 7th CIC Workshop "Computers in Chemistry", BerlinGoogle Scholar
  7. 7.
    Ullmann JR (1976) J Assoc Comput Mach 23:31–42CrossRefGoogle Scholar
  8. 8.
    Daylight Chemical Information Systems (2002) Daylight theory manual, Scholar
  9. 9.
    Rücker G, Rücker C (2001) J Chem Inf Comput Sci 33:1457–1462CrossRefGoogle Scholar
  10. 10.
    Figueras J (1996) J Chem Inf Comput Sci 36:986–991CrossRefGoogle Scholar
  11. 11.
    Fujita S (1988) J Chem Inf Comput Sci 28:1–9Google Scholar
  12. 12.
    Downs GM, Gillet VJ, Holliday JD, Lynch MF (1989) J Chem Inf Comput Sci 29:187–206Google Scholar
  13. 13.
    Dury L, Latour T, Leherte L, Barberis F, Vercauteren DB (2001) J Chem Inf Comput Sci 41:1437–1445CrossRefPubMedGoogle Scholar
  14. 14.
    Abolmaali SFB, Ostermann C, Zell A (2003) J Mol Model, in pressGoogle Scholar
  15. 15.
    Bush BL, Sheridan RP (1993) J Chem Inf Comput Sci 33:756–762Google Scholar
  16. 16.
    Wegner JK, Zell A (2002) JOELib—a java based computational chemistry package. 16th Molecular Modeling Workshop, DarmstadtGoogle Scholar
  17. 17.
    JOELib (2002) Scholar
  18. 18.
    Böhm M, Klebe G (2002) J Med Chem 45:1585–1597CrossRefGoogle Scholar
  19. 19.
    MDL Information Systems (2002) CTfile formats, Scholar
  20. 20.
    Dalby A, Nourse JG, Hounshell WG, Gushurst AKI, Grier DL, Leland BA, Laufer J (1992) J Chem Inf Comput Sci 32:244–255Google Scholar
  21. 21.
    National Cancer Institute, Bethesda, Md., Scholar

Copyright information

© Springer-Verlag 2003

Authors and Affiliations

  • S. F. Badreddin Abolmaali
    • 1
    Email author
  • Jörg K. Wegner
    • 1
  • Andreas Zell
    • 1
  1. 1.Department of Computer ScienceUniversity of TuebingenTübingenGermany

Personalised recommendations