Skip to main content

Evaluation of Molecular Similarity and Molecular Diversity Methods Using Biological Activity Data

  • Protocol

Part of the book series: Methods in Molecular Biology™ ((MIMB,volume 275))

Abstract

This chapter reviews the techniques available for quantifying the effectiveness of methods for molecular similarity and molecular diversity, focusing in particular on similarity searching and on compound selection procedures. The evaluation criteria considered are based on biological activity data, both qualitative and quantitative, with rather different criteria needing to be used depending on the type of data available.

This is a preview of subscription content, log in via an institution.

Buying options

Protocol
USD   49.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD   169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Springer Nature is developing a new tool to find and evaluate Protocols. Learn more

References

  1. Johnson, M. A. and Maggiora, G. M. (eds.) (1990) Concepts and applications of molecular similarity. Wiley, New York.

    Google Scholar 

  2. Dean, P. M. (ed.) (1994) Molecular similarity in drug design. Chapman and Hall, Glasgow.

    Google Scholar 

  3. Willett, P., Barnard, J. M., and Downs, G. M. (1998) Chemical similarity searching. J. Chem. Inf. Comput. Sci. 38, 983–996.

    CAS  Google Scholar 

  4. Dean, P. M. and Lewis, R. A. (eds.) (1999) Molecular diversity in drug design. Kluwer, Amsterdam.

    Google Scholar 

  5. Ghose, A. K. and Viswanadhan, V. N. (eds.) (2001) Combinatorial library design and evaluation: principles, software tools and applications in drug discovery. Marcel Dekker, New York.

    Google Scholar 

  6. Kubinyi, H. (1998) Similarity and dissimilarity—a medicinal chemist’s view. Perspect. Drug. Discov. Design 11, 225–252.

    Article  Google Scholar 

  7. Martin, Y. C., Kofron, J. L., and Traphagen, L. M. (2002) Do structurally similar molecules have similar biological activities? J. Med. Chem. 45, 4350–4358.

    Article  PubMed  CAS  Google Scholar 

  8. Salton, G. and McGill, M. J. (1983) Introduction to modern information retrieval. McGraw-Hill, New York.

    Google Scholar 

  9. Frakes, W. B. and Baeza-Yates, R. (eds.) (1992) Information retrieval: data structures and algorithms. Prentice Hall, Englewood Cliffs, NJ.

    Google Scholar 

  10. Sparck Jones, K. and Willett, P. (eds.) (1997) Readings in information retrieval. Morgan Kaufmann, San Francisco, CA.

    Google Scholar 

  11. Edgar, S. J., Holliday, J. D., and Willett, P. (2000) Effectiveness of retrieval in similarity searches of chemical databases: a review of performance measures. J. Mol. Graph. Model. 18, 343–357.

    Article  PubMed  CAS  Google Scholar 

  12. Kearsley, S. K., Sallamack, S., Fluder, E. M., Andose, J. D., Mosley, R. T. and Sheridan, R. P. (1996) Chemical similarity using physiochemical property descriptors. J. Chem. Inf. Comput. Sci. 36, 118–127.

    CAS  Google Scholar 

  13. Gillet, V. J., Willett, P., and Bradshaw, J. (1998) Identification of biological activity profiles using substructural analysis and genetic algorithms. J. Chem. Inf. Comput. Sci. 38, 165–179.

    PubMed  CAS  Google Scholar 

  14. Güner, O. F. and Henry, D. R. Formula for determining the “goodness of hit lists” in 3D database searches. At URL http://www.netsci.org/Science/Cheminform/feature09.html.

  15. Güner, O. F. and Henry, D. R. (2000) Metric for analyzing hit lists and pharmacophores. In Pharmacophore perception, development and use in drug design, Güner, O. (ed.), International University Line, La Jolla, CA, pp. 193–212

    Google Scholar 

  16. Raymond, J. W. and Willett, P. (2002) Effectiveness of graph-based and fingerprint-based similarity measures for virtual screening of 2D chemical structure databases. J. Comput.-Aid. Mol. Design 16, 59–71.

    Article  CAS  Google Scholar 

  17. Pepperrell, C. A. and Willett, P. (1991) Techniques for the calculation of three-dimensional structural similarity using inter-atomic distances. J. Comput.-Aided Mol. Design 5, 455–474.

    Article  CAS  Google Scholar 

  18. Briem, H. and Lessel, U. F. (2000) In vitro and in silico affinity fingerprints: finding similarities beyond structural classes. Perspect. Drug Discov. Design 20, 231–244.

    Article  CAS  Google Scholar 

  19. Cohen, J. A. (1960) A coefficient of agreement for nominal scales. Educ. Psychol. Measure. 20, 37–46.

    Article  Google Scholar 

  20. Cohen, J. A. (1968) Weighted kappa: nominal scale agreement with provision for scale disagreement or partial credit. Psychol. Bull. 70, 213–220.

    Article  PubMed  CAS  Google Scholar 

  21. Rand, W. M. (1971) Objective criteria for the evaluation of clustering methods. J. Amer. Stat. Assoc. 66, 846–850.

    Article  Google Scholar 

  22. Wilton, D., Willett, P., Mullier, G., and Lawson, K. (2003) Comparison of ranking methods for virtual screening in lead-discovery programmes. J. Chem. Inf. Comput. Sci. 43, 469–474.

    PubMed  CAS  Google Scholar 

  23. Egan, J. P. (1975) Signal detection theory and ROC analysis, Academic Press, New York.

    Google Scholar 

  24. Cuissart, B., Touffet, F., Crémilleux, B., Bureau, R., and Rault, S. (2002) The maximum common substructure as a molecular depiction in a supervised classification context: experiments in quantitative structure/biodegradability relationships. J. Chem. Inf. Comput. Sci. 42, 1043–1052.

    PubMed  CAS  Google Scholar 

  25. Adamson, G. W. and Bush, J. A. (1973) A method for the automatic classification of chemical structures. Inf. Stor. Retriev. 9, 561–568.

    Article  CAS  Google Scholar 

  26. Adamson, G. W. and Bush, J. A. (1975) A comparison of the performance of some similarity and dissimilarity measures in the automatic classification of chemical structures. J. Chem. Inf. Comput. Sci. 15, 55–58.

    PubMed  CAS  Google Scholar 

  27. Willett, P. and Winterman, V. (1986) A comparison of some measures for the determination of inter-molecular structural similarity. Quant. Struct.-Activ. Relat. 5, 18–25.

    Article  CAS  Google Scholar 

  28. Brown, R. D. and Martin, Y. C. (1996) Use of structure-activity data to compare structure-based clustering methods and descriptors for use in compound selection. J. Chem. Inf. Comput. Sci. 36, 572–584.

    CAS  Google Scholar 

  29. Brown, R. D. (1997) Descriptors for diversity analysis. Perspect. Drug Disc. Design 7/8, 31–49.

    CAS  Google Scholar 

  30. Bayada, D. M., Hamersma, H., and van Geerestein, V. J. (1999) Molecular diversity and representativity in chemical databases. J. Chem. Inf. Comput. Sci. 39, 1–10.

    CAS  Google Scholar 

  31. Snarey, M., Terret, N. K., Willett, P., and Wilton, D. J. (1997) Comparison of algorithms for dissimilarity-based compound selection. J. Mol. Graph. Model. 15, 372–385.

    Article  PubMed  CAS  Google Scholar 

  32. Matter, H. and Potter, T. (1999) Comparing 3D pharmacophore triplets and 2D fingerprints for selecting diverse compound subsets. J. Chem. Inf. Comput. Sci. 39, 1211–1225.

    CAS  Google Scholar 

  33. Patterson, D. E., Cramer, R. D., Ferguson, A. M., Clark, R. D., and Weinberger, L. E. (1996) Neighbourhood behaviour: a useful concept for validation of “molecular diversity” descriptors. J. Med. Chem. 39, 3049–3059.

    Article  PubMed  CAS  Google Scholar 

  34. Waldman, M., Li, H., and Hassan, M. (2000) Novel algorithms for the optimisation of molecular diversity of combinatorial libraries. J. Mol. Graph. Model. 18, 412–426.

    Article  PubMed  CAS  Google Scholar 

  35. Ferguson, A. M., Patterson, D. E., Garr, C. D., and Underiner, T. L. (1996) Designing chemical libraries for lead discovery. J. Biomol. Screen. 1, 65–73.

    Article  CAS  Google Scholar 

  36. Bayley, M. J. and Willett, P. (1999) Binning schemes for partition-based compound selection. J. Mol. Graph. Model. 17, 10–18.

    Article  PubMed  CAS  Google Scholar 

  37. Golbraikh, A. and Tropsha, A. (2002) Predictive QSAR modeling based on diversity sampling of experimental datasets for the training and test set selection. J. Comput.-Aid. Mol. Design 16, 357–369.

    Article  CAS  Google Scholar 

  38. Dixon, S. L. and Merz, K. M. (2001) One-dimensional molecular representations and similarity calculations: methodology and validation. J. Med. Chem. 44, 3795–3809.

    Article  PubMed  CAS  Google Scholar 

  39. Coats, E. A. (1998) The CoMFA steroids as a benchmark dataset for development of 3D QSAR methods. Perspect. Drug Discov. Design 12/14, 199–213.

    Article  Google Scholar 

  40. Cramer, R. D., Patterson, D. E., and Bunce, J. D. (1988) Comparative molecular field analysis (CoMFA). Effect of shape on binding of steroids to carrier proteins. J. Am. Chem. Soc. 110, 5959–5967.

    Article  CAS  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2004 Humana Press Inc.

About this protocol

Cite this protocol

Willett, P. (2004). Evaluation of Molecular Similarity and Molecular Diversity Methods Using Biological Activity Data. In: Bajorath, J. (eds) Chemoinformatics. Methods in Molecular Biology™, vol 275. Humana Press. https://doi.org/10.1385/1-59259-802-1:051

Download citation

  • DOI: https://doi.org/10.1385/1-59259-802-1:051

  • Publisher Name: Humana Press

  • Print ISBN: 978-1-58829-261-2

  • Online ISBN: 978-1-59259-802-1

  • eBook Packages: Springer Protocols

Publish with us

Policies and ethics