Abstract
Chemical structural formulas are commonly used for presenting the structural and functional information of organic chemicals. Searching for chemical structures with similar chemical properties is highly desirable especially for drug discovery. However, structural search for chemical formulas is a challenging problem as chemical formulas are highly symbolic and spatially structured. In this paper, we propose a new approach for chemical feature extraction and retrieval. In the proposed approach, we extract four types of functional features from Chemical Functional Group (CFG) Graph built from a chemical structural formula, and use them for the first time for chemical retrieval. The extracted chemical functional features are then used for similarity measurement and query retrieval. The performance evaluation shows that the proposed approach achieves promising accuracy and outperforms a state-of-the-art method for chemical retrieval.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Daylight fingerprint, http://www.daylight.com/dayhtml/doc/theory/theory.finger.html
emolecules.com, http://www.emolecules.com/
Nci structure database, http://cactus.nci.nih.gov/download/nci/
Brown, R., Martin, Y.: Use of structure-activity data to compare structure-based clustering methods and descriptors for use in compound selection. J. Chem. Inform. Comput. Sci. 36(3), 572–584 (1996)
Chow, E.: A graph search heuristic for shortest distance paths. Tech. rep., Lawrence Livermore National Laboratory (2005)
Dalby, A., Nourse, J., Hounshell, W., et al.: Description of several chemical structure file formats used by computer programs developed at molecular design limited. J. Chem. Inform. Comput. Sci. 32(3), 244–255 (1992)
Ewing, T., Baber, J., Feher, M.: Novel 2d fingerprints for ligand-based virtual screening. J. Chem. Inf. Model. 46(6), 2423–2431 (2006)
Fechner, U., Paetz, J., Schneider, G.: Comparison of three holographic fingerprint descriptors and their binary counterparts. QSAR & Combinatorial Science 24(8), 961–967 (2005)
Gaulton, A., Bellis, L., Bento, A., et al.: Chembl: a large-scale bioactivity database for drug discovery. Nucl. Acids Res. 40(1), 1100–1107 (2012)
Hagadone, T.: Molecular substructure similarity searching: efficient retrieval in two-dimensional structure databases. J. Chem. Inform. Comput. Sci. 32(5), 515–521 (1992)
Heller, S., McNaught, A.: The iupac international chemical identifier (inchi). Chemistry International 31(1), 7 (2009)
Hert, J., Willett, P., Wilton, D., et al.: Comparison of topological descriptors for similarity-based virtual screening using multiple bioactive reference structures. Org. Biomol. Chem. 2(22), 3256–3266 (2004)
Manning, C., Raghavan, P., Schutze, H.: Introduction to information retrieval, vol. 1. Cambridge University Press, Cambridge (2008)
McNaught, A.: The iupac international chemical identifier. Chemistry International (2006)
Murray-Rust, P., Rzepa, H.: Chemical markup, xml, and the worldwide web. 1. basic principles. J. Chem. Inform. Comput. Sci. 39(6), 928–942 (1999)
Pence, H., Williams, A.: Chemspider: an online chemical information resource. J. Chem. Educ. (2010)
Rarey, M., Dixon, J.: Feature trees: a new molecular similarity measure based on tree matching. J. Comput. Aided Mol. Des. 12(5), 471–490 (1998)
Schuur, J., Selzer, P., Gasteiger, J.: The coding of the three-dimensional structure of molecules by molecular transforms and its application to structure-spectra correlations and studies of biological activity. J. Chem. Inform. Comput. Sci. 36(2), 334–344 (1996)
Sheridan, R., Kearsley, S.: Why do we need so many chemical similarity search methods? Drug Discovery Today 7(17), 903–911 (2002)
Wang, Y., Xiao, J., Suzek, T., et al.: Pubchem: a public information system for analyzing bioactivities of small molecules. Nucl. Acids Res. 37(2), 623–633 (2009)
Weininger, D.: Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. J. Chem. Inform. Comput. Sci. 28(1), 31–36 (1988)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Tang, P., Hui, S.C., Cong, G. (2012). Functional Feature Extraction and Chemical Retrieval. In: Ailamaki, A., Bowers, S. (eds) Scientific and Statistical Database Management. SSDBM 2012. Lecture Notes in Computer Science, vol 7338. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-31235-9_34
Download citation
DOI: https://doi.org/10.1007/978-3-642-31235-9_34
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-31234-2
Online ISBN: 978-3-642-31235-9
eBook Packages: Computer ScienceComputer Science (R0)