Skip to main content

Functional Feature Extraction and Chemical Retrieval

  • Conference paper
Book cover Scientific and Statistical Database Management (SSDBM 2012)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 7338))

Abstract

Chemical structural formulas are commonly used for presenting the structural and functional information of organic chemicals. Searching for chemical structures with similar chemical properties is highly desirable especially for drug discovery. However, structural search for chemical formulas is a challenging problem as chemical formulas are highly symbolic and spatially structured. In this paper, we propose a new approach for chemical feature extraction and retrieval. In the proposed approach, we extract four types of functional features from Chemical Functional Group (CFG) Graph built from a chemical structural formula, and use them for the first time for chemical retrieval. The extracted chemical functional features are then used for similarity measurement and query retrieval. The performance evaluation shows that the proposed approach achieves promising accuracy and outperforms a state-of-the-art method for chemical retrieval.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Daylight fingerprint, http://www.daylight.com/dayhtml/doc/theory/theory.finger.html

  2. emolecules.com, http://www.emolecules.com/

  3. Nci structure database, http://cactus.nci.nih.gov/download/nci/

  4. Brown, R., Martin, Y.: Use of structure-activity data to compare structure-based clustering methods and descriptors for use in compound selection. J. Chem. Inform. Comput. Sci. 36(3), 572–584 (1996)

    Article  Google Scholar 

  5. Chow, E.: A graph search heuristic for shortest distance paths. Tech. rep., Lawrence Livermore National Laboratory (2005)

    Google Scholar 

  6. Dalby, A., Nourse, J., Hounshell, W., et al.: Description of several chemical structure file formats used by computer programs developed at molecular design limited. J. Chem. Inform. Comput. Sci. 32(3), 244–255 (1992)

    Article  Google Scholar 

  7. Ewing, T., Baber, J., Feher, M.: Novel 2d fingerprints for ligand-based virtual screening. J. Chem. Inf. Model. 46(6), 2423–2431 (2006)

    Article  Google Scholar 

  8. Fechner, U., Paetz, J., Schneider, G.: Comparison of three holographic fingerprint descriptors and their binary counterparts. QSAR & Combinatorial Science 24(8), 961–967 (2005)

    Article  Google Scholar 

  9. Gaulton, A., Bellis, L., Bento, A., et al.: Chembl: a large-scale bioactivity database for drug discovery. Nucl. Acids Res. 40(1), 1100–1107 (2012)

    Article  Google Scholar 

  10. Hagadone, T.: Molecular substructure similarity searching: efficient retrieval in two-dimensional structure databases. J. Chem. Inform. Comput. Sci. 32(5), 515–521 (1992)

    Article  Google Scholar 

  11. Heller, S., McNaught, A.: The iupac international chemical identifier (inchi). Chemistry International 31(1), 7 (2009)

    Google Scholar 

  12. Hert, J., Willett, P., Wilton, D., et al.: Comparison of topological descriptors for similarity-based virtual screening using multiple bioactive reference structures. Org. Biomol. Chem. 2(22), 3256–3266 (2004)

    Article  Google Scholar 

  13. Manning, C., Raghavan, P., Schutze, H.: Introduction to information retrieval, vol. 1. Cambridge University Press, Cambridge (2008)

    MATH  Google Scholar 

  14. McNaught, A.: The iupac international chemical identifier. Chemistry International (2006)

    Google Scholar 

  15. Murray-Rust, P., Rzepa, H.: Chemical markup, xml, and the worldwide web. 1. basic principles. J. Chem. Inform. Comput. Sci. 39(6), 928–942 (1999)

    Article  Google Scholar 

  16. Pence, H., Williams, A.: Chemspider: an online chemical information resource. J. Chem. Educ. (2010)

    Google Scholar 

  17. Rarey, M., Dixon, J.: Feature trees: a new molecular similarity measure based on tree matching. J. Comput. Aided Mol. Des. 12(5), 471–490 (1998)

    Article  Google Scholar 

  18. Schuur, J., Selzer, P., Gasteiger, J.: The coding of the three-dimensional structure of molecules by molecular transforms and its application to structure-spectra correlations and studies of biological activity. J. Chem. Inform. Comput. Sci. 36(2), 334–344 (1996)

    Article  Google Scholar 

  19. Sheridan, R., Kearsley, S.: Why do we need so many chemical similarity search methods? Drug Discovery Today 7(17), 903–911 (2002)

    Article  Google Scholar 

  20. Wang, Y., Xiao, J., Suzek, T., et al.: Pubchem: a public information system for analyzing bioactivities of small molecules. Nucl. Acids Res. 37(2), 623–633 (2009)

    Article  Google Scholar 

  21. Weininger, D.: Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. J. Chem. Inform. Comput. Sci. 28(1), 31–36 (1988)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Tang, P., Hui, S.C., Cong, G. (2012). Functional Feature Extraction and Chemical Retrieval. In: Ailamaki, A., Bowers, S. (eds) Scientific and Statistical Database Management. SSDBM 2012. Lecture Notes in Computer Science, vol 7338. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-31235-9_34

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-31235-9_34

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-31234-2

  • Online ISBN: 978-3-642-31235-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics