Study of Diversity and Similarity of Large Chemical Databases Using Tanimoto Measure

A., Sankara Rao; S., Durga Bhavani; T., Sobha Rani; Bapi, Raju S.; G., Narahari Sastry

doi:10.1007/978-3-642-22786-8_5

Sankara Rao A.³,
Durga Bhavani S.³,
Sobha Rani T.³,
Raju S. Bapi³ &
…
Narahari Sastry G.⁴

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 157))

Included in the following conference series:

International Conference on Information Processing

2086 Accesses
1 Citations

Abstract

ZINC is a freely available chemical database which contains 27 million compounds including Drug-like, Natural Products, FDA etc., along with 9 molecular features. In this paper firstly we compute an additional number of 49 molecular features and represent the entire chemical space in the 58-length finger print space. Tanimoto metric, a popular similarity measure is used to mine the chemical space for extracting similar and diverse fingerprints. One of the important issues is that of choosing a proper reference string. Experiments with different reference strings are carried out to assess the appropriateness of a reference string. A finger print which is constituted by mandating non-trivial presence of each feature is found to be the best. Further a method which is independent of reference string is proposed using pairwise distribution but this raises the time complexity from linear to quadratic. A subgoal of this paper is also to propose a scheme that extracts a small sample data set that reflects the similarity and diversity of the population. Towards this, we conduct stratified sampling of Natural Products Database(NPD) which has 90,000 chemical compounds by dividing the space along strata representing distinct structures (rings) and then compute pairwise similarity profile. This scheme can be extended to other data bases that reside in ZINC.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Irwin, J.J., Shoichet, B.K.: ZINC - A Free Database of Commercially Available Compounds for Virtual Screening. Journal on Chemical Information Model 45(1), 177–182 (2005)
Article Google Scholar
Willett, P., Barnard, J.M., Downs, G.M.: Chemical Similarity Searching. J. Chemical Information Computer Science 38, 983–996 (1998)
Article Google Scholar
Baldi, P., Benz, R.W.: BLASTing Small Molecules Statistics and Extreme Statistics of Chemical Similarity Scores. J. Chem. Inf. Model 24, i357-i365 (2008)
Google Scholar
Karwath, A., Raedt, L.D.: SMIREP. Journal of Chemical Information Model 46, 2432–2444 (2006)
Article Google Scholar
Brown, N.: Chemoinformatics-An Introduction for Computer Scientists. ACM Computer Survey 41(2), 1–38 (2009)
Article Google Scholar
Weininger, D.: SMILES, A Chemical Language and Information System. Journal of Chemical Information Computer Science 28, 31–36 (1988)
Article Google Scholar
Wang, Y., Bajorath, J.: Bit Silencing in Fingerprints Enables the Derivation of Compound Class-Directed Similarity Metrics. Journal of Chemical Information Model 48, 1754–1759 (2008)
Article Google Scholar
Durant, J.L., Leland, B.A., Henry, D.R., Nourse, J.G.: Reoptimization of MDL Keys for use in Drug Discovery. Journal of Chemical Information Computer Science 42, 1273–1280 (2002)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Computational Intelligence Lab, Department of Computer and Information Sciences, University of Hyderabad, Hyderabad, India
Sankara Rao A., Durga Bhavani S., Sobha Rani T. & Raju S. Bapi
Molecular Modeling Group, Indian Institute of Chemical Technology, Hyderabad, India
Narahari Sastry G.

Authors

Sankara Rao A.
View author publications
You can also search for this author in PubMed Google Scholar
Durga Bhavani S.
View author publications
You can also search for this author in PubMed Google Scholar
Sobha Rani T.
View author publications
You can also search for this author in PubMed Google Scholar
Raju S. Bapi
View author publications
You can also search for this author in PubMed Google Scholar
Narahari Sastry G.
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

College of Engineering, University Visvesvaraya, Bangalore, India
K. R. Venugopal
Defence Institute of Advanced Technology, Pune, India
L. M. Patnaik

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

A., S.R., S., D.B., T., S.R., Bapi, R.S., G., N.S. (2011). Study of Diversity and Similarity of Large Chemical Databases Using Tanimoto Measure. In: Venugopal, K.R., Patnaik, L.M. (eds) Computer Networks and Intelligent Computing. ICIP 2011. Communications in Computer and Information Science, vol 157. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-22786-8_5

Download citation

DOI: https://doi.org/10.1007/978-3-642-22786-8_5
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-22785-1
Online ISBN: 978-3-642-22786-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics