Evaluation of Molecular Similarity and Molecular Diversity Methods Using Biological Activity Data

  • Peter Willett
Part of the Methods in Molecular Biology™ book series (MIMB, volume 275)


This chapter reviews the techniques available for quantifying the effectiveness of methods for molecular similarity and molecular diversity, focusing in particular on similarity searching and on compound selection procedures. The evaluation criteria considered are based on biological activity data, both qualitative and quantitative, with rather different criteria needing to be used depending on the type of data available.

Key Words

Chemical database compound selection library design molecular diversity molecular similarity neighborhood behavior similar property principle similarity searching 


  1. 1.
    Johnson, M. A. and Maggiora, G. M. (eds.) (1990) Concepts and applications of molecular similarity. Wiley, New York.Google Scholar
  2. 2.
    Dean, P. M. (ed.) (1994) Molecular similarity in drug design. Chapman and Hall, Glasgow.Google Scholar
  3. 3.
    Willett, P., Barnard, J. M., and Downs, G. M. (1998) Chemical similarity searching. J. Chem. Inf. Comput. Sci. 38, 983–996.Google Scholar
  4. 4.
    Dean, P. M. and Lewis, R. A. (eds.) (1999) Molecular diversity in drug design. Kluwer, Amsterdam.Google Scholar
  5. 5.
    Ghose, A. K. and Viswanadhan, V. N. (eds.) (2001) Combinatorial library design and evaluation: principles, software tools and applications in drug discovery. Marcel Dekker, New York.Google Scholar
  6. 6.
    Kubinyi, H. (1998) Similarity and dissimilarity—a medicinal chemist’s view. Perspect. Drug. Discov. Design 11, 225–252.CrossRefGoogle Scholar
  7. 7.
    Martin, Y. C., Kofron, J. L., and Traphagen, L. M. (2002) Do structurally similar molecules have similar biological activities? J. Med. Chem. 45, 4350–4358.PubMedCrossRefGoogle Scholar
  8. 8.
    Salton, G. and McGill, M. J. (1983) Introduction to modern information retrieval. McGraw-Hill, New York.Google Scholar
  9. 9.
    Frakes, W. B. and Baeza-Yates, R. (eds.) (1992) Information retrieval: data structures and algorithms. Prentice Hall, Englewood Cliffs, NJ.Google Scholar
  10. 10.
    Sparck Jones, K. and Willett, P. (eds.) (1997) Readings in information retrieval. Morgan Kaufmann, San Francisco, CA.Google Scholar
  11. 11.
    Edgar, S. J., Holliday, J. D., and Willett, P. (2000) Effectiveness of retrieval in similarity searches of chemical databases: a review of performance measures. J. Mol. Graph. Model. 18, 343–357.PubMedCrossRefGoogle Scholar
  12. 12.
    Kearsley, S. K., Sallamack, S., Fluder, E. M., Andose, J. D., Mosley, R. T. and Sheridan, R. P. (1996) Chemical similarity using physiochemical property descriptors. J. Chem. Inf. Comput. Sci. 36, 118–127.Google Scholar
  13. 13.
    Gillet, V. J., Willett, P., and Bradshaw, J. (1998) Identification of biological activity profiles using substructural analysis and genetic algorithms. J. Chem. Inf. Comput. Sci. 38, 165–179.PubMedGoogle Scholar
  14. 14.
    Güner, O. F. and Henry, D. R. Formula for determining the “goodness of hit lists” in 3D database searches. At URL http://www.netsci.org/Science/Cheminform/feature09.html.
  15. 15.
    Güner, O. F. and Henry, D. R. (2000) Metric for analyzing hit lists and pharmacophores. In Pharmacophore perception, development and use in drug design, Güner, O. (ed.), International University Line, La Jolla, CA, pp. 193–212Google Scholar
  16. 16.
    Raymond, J. W. and Willett, P. (2002) Effectiveness of graph-based and fingerprint-based similarity measures for virtual screening of 2D chemical structure databases. J. Comput.-Aid. Mol. Design 16, 59–71.CrossRefGoogle Scholar
  17. 17.
    Pepperrell, C. A. and Willett, P. (1991) Techniques for the calculation of three-dimensional structural similarity using inter-atomic distances. J. Comput.-Aided Mol. Design 5, 455–474.CrossRefGoogle Scholar
  18. 18.
    Briem, H. and Lessel, U. F. (2000) In vitro and in silico affinity fingerprints: finding similarities beyond structural classes. Perspect. Drug Discov. Design 20, 231–244.CrossRefGoogle Scholar
  19. 19.
    Cohen, J. A. (1960) A coefficient of agreement for nominal scales. Educ. Psychol. Measure. 20, 37–46.CrossRefGoogle Scholar
  20. 20.
    Cohen, J. A. (1968) Weighted kappa: nominal scale agreement with provision for scale disagreement or partial credit. Psychol. Bull. 70, 213–220.PubMedCrossRefGoogle Scholar
  21. 21.
    Rand, W. M. (1971) Objective criteria for the evaluation of clustering methods. J. Amer. Stat. Assoc. 66, 846–850.CrossRefGoogle Scholar
  22. 22.
    Wilton, D., Willett, P., Mullier, G., and Lawson, K. (2003) Comparison of ranking methods for virtual screening in lead-discovery programmes. J. Chem. Inf. Comput. Sci. 43, 469–474.PubMedGoogle Scholar
  23. 23.
    Egan, J. P. (1975) Signal detection theory and ROC analysis, Academic Press, New York.Google Scholar
  24. 24.
    Cuissart, B., Touffet, F., Crémilleux, B., Bureau, R., and Rault, S. (2002) The maximum common substructure as a molecular depiction in a supervised classification context: experiments in quantitative structure/biodegradability relationships. J. Chem. Inf. Comput. Sci. 42, 1043–1052.PubMedGoogle Scholar
  25. 25.
    Adamson, G. W. and Bush, J. A. (1973) A method for the automatic classification of chemical structures. Inf. Stor. Retriev. 9, 561–568.CrossRefGoogle Scholar
  26. 26.
    Adamson, G. W. and Bush, J. A. (1975) A comparison of the performance of some similarity and dissimilarity measures in the automatic classification of chemical structures. J. Chem. Inf. Comput. Sci. 15, 55–58.PubMedGoogle Scholar
  27. 27.
    Willett, P. and Winterman, V. (1986) A comparison of some measures for the determination of inter-molecular structural similarity. Quant. Struct.-Activ. Relat. 5, 18–25.CrossRefGoogle Scholar
  28. 28.
    Brown, R. D. and Martin, Y. C. (1996) Use of structure-activity data to compare structure-based clustering methods and descriptors for use in compound selection. J. Chem. Inf. Comput. Sci. 36, 572–584.Google Scholar
  29. 29.
    Brown, R. D. (1997) Descriptors for diversity analysis. Perspect. Drug Disc. Design 7/8, 31–49.Google Scholar
  30. 30.
    Bayada, D. M., Hamersma, H., and van Geerestein, V. J. (1999) Molecular diversity and representativity in chemical databases. J. Chem. Inf. Comput. Sci. 39, 1–10.Google Scholar
  31. 31.
    Snarey, M., Terret, N. K., Willett, P., and Wilton, D. J. (1997) Comparison of algorithms for dissimilarity-based compound selection. J. Mol. Graph. Model. 15, 372–385.PubMedCrossRefGoogle Scholar
  32. 32.
    Matter, H. and Potter, T. (1999) Comparing 3D pharmacophore triplets and 2D fingerprints for selecting diverse compound subsets. J. Chem. Inf. Comput. Sci. 39, 1211–1225.Google Scholar
  33. 33.
    Patterson, D. E., Cramer, R. D., Ferguson, A. M., Clark, R. D., and Weinberger, L. E. (1996) Neighbourhood behaviour: a useful concept for validation of “molecular diversity” descriptors. J. Med. Chem. 39, 3049–3059.PubMedCrossRefGoogle Scholar
  34. 34.
    Waldman, M., Li, H., and Hassan, M. (2000) Novel algorithms for the optimisation of molecular diversity of combinatorial libraries. J. Mol. Graph. Model. 18, 412–426.PubMedCrossRefGoogle Scholar
  35. 35.
    Ferguson, A. M., Patterson, D. E., Garr, C. D., and Underiner, T. L. (1996) Designing chemical libraries for lead discovery. J. Biomol. Screen. 1, 65–73.CrossRefGoogle Scholar
  36. 36.
    Bayley, M. J. and Willett, P. (1999) Binning schemes for partition-based compound selection. J. Mol. Graph. Model. 17, 10–18.PubMedCrossRefGoogle Scholar
  37. 37.
    Golbraikh, A. and Tropsha, A. (2002) Predictive QSAR modeling based on diversity sampling of experimental datasets for the training and test set selection. J. Comput.-Aid. Mol. Design 16, 357–369.CrossRefGoogle Scholar
  38. 38.
    Dixon, S. L. and Merz, K. M. (2001) One-dimensional molecular representations and similarity calculations: methodology and validation. J. Med. Chem. 44, 3795–3809.PubMedCrossRefGoogle Scholar
  39. 39.
    Coats, E. A. (1998) The CoMFA steroids as a benchmark dataset for development of 3D QSAR methods. Perspect. Drug Discov. Design 12/14, 199–213.CrossRefGoogle Scholar
  40. 40.
    Cramer, R. D., Patterson, D. E., and Bunce, J. D. (1988) Comparative molecular field analysis (CoMFA). Effect of shape on binding of steroids to carrier proteins. J. Am. Chem. Soc. 110, 5959–5967.CrossRefGoogle Scholar

Copyright information

© Humana Press Inc. 2004

Authors and Affiliations

  • Peter Willett
    • 1
  1. 1.Krebs Institute for Biomolecular Research and Department of Information StudiesUniversity of SheffieldSheffieldUK

Personalised recommendations