Evaluation of Molecular Similarity and Molecular Diversity Methods Using Biological Activity Data

Willett, Peter

doi:10.1385/1-59259-802-1:051

Evaluation of Molecular Similarity and Molecular Diversity Methods Using Biological Activity Data

Peter Willett³

Protocol

1210 Accesses
13 Citations

Part of the book series: Methods in Molecular Biology™ ((MIMB,volume 275))

Abstract

This chapter reviews the techniques available for quantifying the effectiveness of methods for molecular similarity and molecular diversity, focusing in particular on similarity searching and on compound selection procedures. The evaluation criteria considered are based on biological activity data, both qualitative and quantitative, with rather different criteria needing to be used depending on the type of data available.

This is a preview of subscription content, log in via an institution.

Protocol: USD 49.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Springer Nature is developing a new tool to find and evaluate Protocols. Learn more

References

Johnson, M. A. and Maggiora, G. M. (eds.) (1990) Concepts and applications of molecular similarity. Wiley, New York.
Google Scholar
Dean, P. M. (ed.) (1994) Molecular similarity in drug design. Chapman and Hall, Glasgow.
Google Scholar
Willett, P., Barnard, J. M., and Downs, G. M. (1998) Chemical similarity searching. J. Chem. Inf. Comput. Sci. 38, 983–996.
CAS Google Scholar
Dean, P. M. and Lewis, R. A. (eds.) (1999) Molecular diversity in drug design. Kluwer, Amsterdam.
Google Scholar
Ghose, A. K. and Viswanadhan, V. N. (eds.) (2001) Combinatorial library design and evaluation: principles, software tools and applications in drug discovery. Marcel Dekker, New York.
Google Scholar
Kubinyi, H. (1998) Similarity and dissimilarity—a medicinal chemist’s view. Perspect. Drug. Discov. Design 11, 225–252.
Article Google Scholar
Martin, Y. C., Kofron, J. L., and Traphagen, L. M. (2002) Do structurally similar molecules have similar biological activities? J. Med. Chem. 45, 4350–4358.
Article PubMed CAS Google Scholar
Salton, G. and McGill, M. J. (1983) Introduction to modern information retrieval. McGraw-Hill, New York.
Google Scholar
Frakes, W. B. and Baeza-Yates, R. (eds.) (1992) Information retrieval: data structures and algorithms. Prentice Hall, Englewood Cliffs, NJ.
Google Scholar
Sparck Jones, K. and Willett, P. (eds.) (1997) Readings in information retrieval. Morgan Kaufmann, San Francisco, CA.
Google Scholar
Edgar, S. J., Holliday, J. D., and Willett, P. (2000) Effectiveness of retrieval in similarity searches of chemical databases: a review of performance measures. J. Mol. Graph. Model. 18, 343–357.
Article PubMed CAS Google Scholar
Kearsley, S. K., Sallamack, S., Fluder, E. M., Andose, J. D., Mosley, R. T. and Sheridan, R. P. (1996) Chemical similarity using physiochemical property descriptors. J. Chem. Inf. Comput. Sci. 36, 118–127.
CAS Google Scholar
Gillet, V. J., Willett, P., and Bradshaw, J. (1998) Identification of biological activity profiles using substructural analysis and genetic algorithms. J. Chem. Inf. Comput. Sci. 38, 165–179.
PubMed CAS Google Scholar
Güner, O. F. and Henry, D. R. Formula for determining the “goodness of hit lists” in 3D database searches. At URL http://www.netsci.org/Science/Cheminform/feature09.html.
Güner, O. F. and Henry, D. R. (2000) Metric for analyzing hit lists and pharmacophores. In Pharmacophore perception, development and use in drug design, Güner, O. (ed.), International University Line, La Jolla, CA, pp. 193–212
Google Scholar
Raymond, J. W. and Willett, P. (2002) Effectiveness of graph-based and fingerprint-based similarity measures for virtual screening of 2D chemical structure databases. J. Comput.-Aid. Mol. Design 16, 59–71.
Article CAS Google Scholar
Pepperrell, C. A. and Willett, P. (1991) Techniques for the calculation of three-dimensional structural similarity using inter-atomic distances. J. Comput.-Aided Mol. Design 5, 455–474.
Article CAS Google Scholar
Briem, H. and Lessel, U. F. (2000) In vitro and in silico affinity fingerprints: finding similarities beyond structural classes. Perspect. Drug Discov. Design 20, 231–244.
Article CAS Google Scholar
Cohen, J. A. (1960) A coefficient of agreement for nominal scales. Educ. Psychol. Measure. 20, 37–46.
Article Google Scholar
Cohen, J. A. (1968) Weighted kappa: nominal scale agreement with provision for scale disagreement or partial credit. Psychol. Bull. 70, 213–220.
Article PubMed CAS Google Scholar
Rand, W. M. (1971) Objective criteria for the evaluation of clustering methods. J. Amer. Stat. Assoc. 66, 846–850.
Article Google Scholar
Wilton, D., Willett, P., Mullier, G., and Lawson, K. (2003) Comparison of ranking methods for virtual screening in lead-discovery programmes. J. Chem. Inf. Comput. Sci. 43, 469–474.
PubMed CAS Google Scholar
Egan, J. P. (1975) Signal detection theory and ROC analysis, Academic Press, New York.
Google Scholar
Cuissart, B., Touffet, F., Crémilleux, B., Bureau, R., and Rault, S. (2002) The maximum common substructure as a molecular depiction in a supervised classification context: experiments in quantitative structure/biodegradability relationships. J. Chem. Inf. Comput. Sci. 42, 1043–1052.
PubMed CAS Google Scholar
Adamson, G. W. and Bush, J. A. (1973) A method for the automatic classification of chemical structures. Inf. Stor. Retriev. 9, 561–568.
Article CAS Google Scholar
Adamson, G. W. and Bush, J. A. (1975) A comparison of the performance of some similarity and dissimilarity measures in the automatic classification of chemical structures. J. Chem. Inf. Comput. Sci. 15, 55–58.
PubMed CAS Google Scholar
Willett, P. and Winterman, V. (1986) A comparison of some measures for the determination of inter-molecular structural similarity. Quant. Struct.-Activ. Relat. 5, 18–25.
Article CAS Google Scholar
Brown, R. D. and Martin, Y. C. (1996) Use of structure-activity data to compare structure-based clustering methods and descriptors for use in compound selection. J. Chem. Inf. Comput. Sci. 36, 572–584.
CAS Google Scholar
Brown, R. D. (1997) Descriptors for diversity analysis. Perspect. Drug Disc. Design 7/8, 31–49.
CAS Google Scholar
Bayada, D. M., Hamersma, H., and van Geerestein, V. J. (1999) Molecular diversity and representativity in chemical databases. J. Chem. Inf. Comput. Sci. 39, 1–10.
CAS Google Scholar
Snarey, M., Terret, N. K., Willett, P., and Wilton, D. J. (1997) Comparison of algorithms for dissimilarity-based compound selection. J. Mol. Graph. Model. 15, 372–385.
Article PubMed CAS Google Scholar
Matter, H. and Potter, T. (1999) Comparing 3D pharmacophore triplets and 2D fingerprints for selecting diverse compound subsets. J. Chem. Inf. Comput. Sci. 39, 1211–1225.
CAS Google Scholar
Patterson, D. E., Cramer, R. D., Ferguson, A. M., Clark, R. D., and Weinberger, L. E. (1996) Neighbourhood behaviour: a useful concept for validation of “molecular diversity” descriptors. J. Med. Chem. 39, 3049–3059.
Article PubMed CAS Google Scholar
Waldman, M., Li, H., and Hassan, M. (2000) Novel algorithms for the optimisation of molecular diversity of combinatorial libraries. J. Mol. Graph. Model. 18, 412–426.
Article PubMed CAS Google Scholar
Ferguson, A. M., Patterson, D. E., Garr, C. D., and Underiner, T. L. (1996) Designing chemical libraries for lead discovery. J. Biomol. Screen. 1, 65–73.
Article CAS Google Scholar
Bayley, M. J. and Willett, P. (1999) Binning schemes for partition-based compound selection. J. Mol. Graph. Model. 17, 10–18.
Article PubMed CAS Google Scholar
Golbraikh, A. and Tropsha, A. (2002) Predictive QSAR modeling based on diversity sampling of experimental datasets for the training and test set selection. J. Comput.-Aid. Mol. Design 16, 357–369.
Article CAS Google Scholar
Dixon, S. L. and Merz, K. M. (2001) One-dimensional molecular representations and similarity calculations: methodology and validation. J. Med. Chem. 44, 3795–3809.
Article PubMed CAS Google Scholar
Coats, E. A. (1998) The CoMFA steroids as a benchmark dataset for development of 3D QSAR methods. Perspect. Drug Discov. Design 12/14, 199–213.
Article Google Scholar
Cramer, R. D., Patterson, D. E., and Bunce, J. D. (1988) Comparative molecular field analysis (CoMFA). Effect of shape on binding of steroids to carrier proteins. J. Am. Chem. Soc. 110, 5959–5967.
Article CAS Google Scholar

Download references

Author information

Authors and Affiliations

Krebs Institute for Biomolecular Research and Department of Information Studies, University of Sheffield, Sheffield, UK
Peter Willett

Authors

Peter Willett
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Albany Molecular Research Inc., Bothell Research Center, Bothell, WA
Jürgen Bajorath
University of Washington, Seattle, WA
Jürgen Bajorath

Rights and permissions

Reprints and permissions

Copyright information

About this protocol

Cite this protocol

Willett, P. (2004). Evaluation of Molecular Similarity and Molecular Diversity Methods Using Biological Activity Data. In: Bajorath, J. (eds) Chemoinformatics. Methods in Molecular Biology™, vol 275. Humana Press. https://doi.org/10.1385/1-59259-802-1:051

Download citation

DOI: https://doi.org/10.1385/1-59259-802-1:051
Publisher Name: Humana Press
Print ISBN: 978-1-58829-261-2
Online ISBN: 978-1-59259-802-1
eBook Packages: Springer Protocols

Publish with us

Policies and ethics