Abstract
This chapter reviews the techniques available for quantifying the effectiveness of methods for molecular similarity and molecular diversity, focusing in particular on similarity searching and on compound selection procedures. The evaluation criteria considered are based on biological activity data, both qualitative and quantitative, with rather different criteria needing to be used depending on the type of data available.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Johnson, M. A. and Maggiora, G. M. (eds.) (1990) Concepts and applications of molecular similarity. Wiley, New York.
Dean, P. M. (ed.) (1994) Molecular similarity in drug design. Chapman and Hall, Glasgow.
Willett, P., Barnard, J. M., and Downs, G. M. (1998) Chemical similarity searching. J. Chem. Inf. Comput. Sci. 38, 983–996.
Dean, P. M. and Lewis, R. A. (eds.) (1999) Molecular diversity in drug design. Kluwer, Amsterdam.
Ghose, A. K. and Viswanadhan, V. N. (eds.) (2001) Combinatorial library design and evaluation: principles, software tools and applications in drug discovery. Marcel Dekker, New York.
Kubinyi, H. (1998) Similarity and dissimilarity—a medicinal chemist’s view. Perspect. Drug. Discov. Design 11, 225–252.
Martin, Y. C., Kofron, J. L., and Traphagen, L. M. (2002) Do structurally similar molecules have similar biological activities? J. Med. Chem. 45, 4350–4358.
Salton, G. and McGill, M. J. (1983) Introduction to modern information retrieval. McGraw-Hill, New York.
Frakes, W. B. and Baeza-Yates, R. (eds.) (1992) Information retrieval: data structures and algorithms. Prentice Hall, Englewood Cliffs, NJ.
Sparck Jones, K. and Willett, P. (eds.) (1997) Readings in information retrieval. Morgan Kaufmann, San Francisco, CA.
Edgar, S. J., Holliday, J. D., and Willett, P. (2000) Effectiveness of retrieval in similarity searches of chemical databases: a review of performance measures. J. Mol. Graph. Model. 18, 343–357.
Kearsley, S. K., Sallamack, S., Fluder, E. M., Andose, J. D., Mosley, R. T. and Sheridan, R. P. (1996) Chemical similarity using physiochemical property descriptors. J. Chem. Inf. Comput. Sci. 36, 118–127.
Gillet, V. J., Willett, P., and Bradshaw, J. (1998) Identification of biological activity profiles using substructural analysis and genetic algorithms. J. Chem. Inf. Comput. Sci. 38, 165–179.
Güner, O. F. and Henry, D. R. Formula for determining the “goodness of hit lists” in 3D database searches. At URL http://www.netsci.org/Science/Cheminform/feature09.html.
Güner, O. F. and Henry, D. R. (2000) Metric for analyzing hit lists and pharmacophores. In Pharmacophore perception, development and use in drug design, Güner, O. (ed.), International University Line, La Jolla, CA, pp. 193–212
Raymond, J. W. and Willett, P. (2002) Effectiveness of graph-based and fingerprint-based similarity measures for virtual screening of 2D chemical structure databases. J. Comput.-Aid. Mol. Design 16, 59–71.
Pepperrell, C. A. and Willett, P. (1991) Techniques for the calculation of three-dimensional structural similarity using inter-atomic distances. J. Comput.-Aided Mol. Design 5, 455–474.
Briem, H. and Lessel, U. F. (2000) In vitro and in silico affinity fingerprints: finding similarities beyond structural classes. Perspect. Drug Discov. Design 20, 231–244.
Cohen, J. A. (1960) A coefficient of agreement for nominal scales. Educ. Psychol. Measure. 20, 37–46.
Cohen, J. A. (1968) Weighted kappa: nominal scale agreement with provision for scale disagreement or partial credit. Psychol. Bull. 70, 213–220.
Rand, W. M. (1971) Objective criteria for the evaluation of clustering methods. J. Amer. Stat. Assoc. 66, 846–850.
Wilton, D., Willett, P., Mullier, G., and Lawson, K. (2003) Comparison of ranking methods for virtual screening in lead-discovery programmes. J. Chem. Inf. Comput. Sci. 43, 469–474.
Egan, J. P. (1975) Signal detection theory and ROC analysis, Academic Press, New York.
Cuissart, B., Touffet, F., Crémilleux, B., Bureau, R., and Rault, S. (2002) The maximum common substructure as a molecular depiction in a supervised classification context: experiments in quantitative structure/biodegradability relationships. J. Chem. Inf. Comput. Sci. 42, 1043–1052.
Adamson, G. W. and Bush, J. A. (1973) A method for the automatic classification of chemical structures. Inf. Stor. Retriev. 9, 561–568.
Adamson, G. W. and Bush, J. A. (1975) A comparison of the performance of some similarity and dissimilarity measures in the automatic classification of chemical structures. J. Chem. Inf. Comput. Sci. 15, 55–58.
Willett, P. and Winterman, V. (1986) A comparison of some measures for the determination of inter-molecular structural similarity. Quant. Struct.-Activ. Relat. 5, 18–25.
Brown, R. D. and Martin, Y. C. (1996) Use of structure-activity data to compare structure-based clustering methods and descriptors for use in compound selection. J. Chem. Inf. Comput. Sci. 36, 572–584.
Brown, R. D. (1997) Descriptors for diversity analysis. Perspect. Drug Disc. Design 7/8, 31–49.
Bayada, D. M., Hamersma, H., and van Geerestein, V. J. (1999) Molecular diversity and representativity in chemical databases. J. Chem. Inf. Comput. Sci. 39, 1–10.
Snarey, M., Terret, N. K., Willett, P., and Wilton, D. J. (1997) Comparison of algorithms for dissimilarity-based compound selection. J. Mol. Graph. Model. 15, 372–385.
Matter, H. and Potter, T. (1999) Comparing 3D pharmacophore triplets and 2D fingerprints for selecting diverse compound subsets. J. Chem. Inf. Comput. Sci. 39, 1211–1225.
Patterson, D. E., Cramer, R. D., Ferguson, A. M., Clark, R. D., and Weinberger, L. E. (1996) Neighbourhood behaviour: a useful concept for validation of “molecular diversity” descriptors. J. Med. Chem. 39, 3049–3059.
Waldman, M., Li, H., and Hassan, M. (2000) Novel algorithms for the optimisation of molecular diversity of combinatorial libraries. J. Mol. Graph. Model. 18, 412–426.
Ferguson, A. M., Patterson, D. E., Garr, C. D., and Underiner, T. L. (1996) Designing chemical libraries for lead discovery. J. Biomol. Screen. 1, 65–73.
Bayley, M. J. and Willett, P. (1999) Binning schemes for partition-based compound selection. J. Mol. Graph. Model. 17, 10–18.
Golbraikh, A. and Tropsha, A. (2002) Predictive QSAR modeling based on diversity sampling of experimental datasets for the training and test set selection. J. Comput.-Aid. Mol. Design 16, 357–369.
Dixon, S. L. and Merz, K. M. (2001) One-dimensional molecular representations and similarity calculations: methodology and validation. J. Med. Chem. 44, 3795–3809.
Coats, E. A. (1998) The CoMFA steroids as a benchmark dataset for development of 3D QSAR methods. Perspect. Drug Discov. Design 12/14, 199–213.
Cramer, R. D., Patterson, D. E., and Bunce, J. D. (1988) Comparative molecular field analysis (CoMFA). Effect of shape on binding of steroids to carrier proteins. J. Am. Chem. Soc. 110, 5959–5967.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2004 Humana Press Inc.
About this protocol
Cite this protocol
Willett, P. (2004). Evaluation of Molecular Similarity and Molecular Diversity Methods Using Biological Activity Data. In: Bajorath, J. (eds) Chemoinformatics. Methods in Molecular Biology™, vol 275. Humana Press. https://doi.org/10.1385/1-59259-802-1:051
Download citation
DOI: https://doi.org/10.1385/1-59259-802-1:051
Publisher Name: Humana Press
Print ISBN: 978-1-58829-261-2
Online ISBN: 978-1-59259-802-1
eBook Packages: Springer Protocols