Chemoinformatics and Computational Chemical Biology pp 133-158

Part of the Methods in Molecular Biology book series (MIMB, volume 672) | Cite as

Similarity Searching Using 2D Structural Fingerprints

  • Peter Willett
Protocol

Abstract

This chapter reviews the use of molecular fingerprints for chemical similarity searching. The fingerprints encode the presence of 2D substructural fragments in a molecule, and the similarity between a pair of molecules is a function of the number of fragments that they have in common. Although this provides a very simple way of estimating the degree of structural similarity between two molecules, it has been found to provide an effective and an efficient tool for searching large chemical databases. The review describes the historical development of similarity searching since it was first described in the mid-1980s, reviews the many different coefficients, representations, and weightings that can be combined to form a similarity measure, describes quantitative measures of the effectiveness of similarity searching, and concludes by looking at current developments based on the use of data fusion and machine learning techniques.

Key words

Chemical databases Chemoinformatics Data fusion Fingerprint Fragment substructure Machine learning Similar property principle Similarity coefficient Similarity measure Similarity searching Weighting scheme 

References

  1. 1.
    Rouvray, D. H. (1990) The evolution of the concept of molecular similarity, in Concepts and Applications of Molecular Similarity (Johnson, M. A., and Maggiora, G. M., Eds.), pp 15–42, John Wiley, Chichester.Google Scholar
  2. 2.
    Bender, A., and Glen, R. C. (2004) Molecular similarity: a key technique in molecular informatics. Organic and Biomolecular Chemistry 2, 3204–3218.PubMedCrossRefGoogle Scholar
  3. 3.
    Dean, P. M., (Ed.) (1994) Molecular Similarity in Drug Design, Chapman and Hall, Glasgow.Google Scholar
  4. 4.
    Downs, G. M., and Willett, P. (1995) Similarity searching in databases of chemical structures. Reviews in Computational Chemistry 7, 1–66.Google Scholar
  5. 5.
    Maldonado, A. G., Doucet, J. P., Petitjean, M., and Fan, B.-T. (2006) Molecular similarity and diversity in chemoinformatics: from theory to applications. Molecular Diversity 10, 39–79.PubMedCrossRefGoogle Scholar
  6. 6.
    Nikolova, N., and Jaworska, J. (2003) Approaches to measure chemical similarity – a review. Quantitative Structure-Activity Relationships and Combinatorial Science 22, 1006–1026.Google Scholar
  7. 7.
    Sheridan, R. P., and Kearsley, S. K. (2002) Why do we need so many chemical similarity search methods? Drug Discovery Today 7, 903–911.PubMedCrossRefGoogle Scholar
  8. 8.
    Alvarez, J., and Shoichet, B., (Eds.) (2005) Virtual Screening in Drug Discovery, CRC Press, Boca Raton.Google Scholar
  9. 9.
    Bajorath, J. (2002) Integration of virtual and high-throughput screening. Nature Reviews Drug Discovery 1, 882–894.PubMedCrossRefGoogle Scholar
  10. 10.
    Böhm, H.-J., and Schneider, G., (Eds.) (2000) Virtual Screening for Bioactive Molecules, Wiley-VCH, Weinheim.Google Scholar
  11. 11.
    Klebe, G., (Ed.) (2000) Virtual Screening: An Alternative or Complement to High Throughput Screening, Kluwer, Dordrecht.Google Scholar
  12. 12.
    Lengauer, T., Lemmen, C., Rarey, M., and Zimmermann, M. (2004) Novel technologies for virtual screening. Drug Discovery Today 9, 27–34.PubMedCrossRefGoogle Scholar
  13. 13.
    Oprea, T. I., and Matter, H. (2004) Integrating virtual screening in lead discovery. Current Opinion in Chemical Biology 8, 349–358.PubMedCrossRefGoogle Scholar
  14. 14.
    Gedeck, P., Rhode, B., and Bartels, C. (2006) QSAR – how good is it in practice? Comparison of descriptor sets on an unbiased cross section of corporate data sets. Journal of Chemical Information and Modeling 46, 1924–1936.PubMedCrossRefGoogle Scholar
  15. 15.
    McGaughey, G. B., Sheridan, R. P., Bayly, C. I., Culberson, J. C., Kreatsoulas, C., Lindsley, S., Maiorov, V., Truchon, J.-F., and Cornell, W. D. (2007) Comparison of topological, shape, and docking methods in virtual screening. Journal of Chemical Information and Modeling 47, 1504–1519.PubMedCrossRefGoogle Scholar
  16. 16.
    Sheridan, R. P. (2007) Chemical similarity searches: when is complexity justified? Expert Opinion on Drug Discovery 2, 423–430.CrossRefGoogle Scholar
  17. 17.
    Sheridan, R. P., McGaughey, G. B., and Cornell, W. D. (2008) Multiple protein structures and multiple ligands: effects on the apparent goodness of virtual screening results. Journal of Computer-Aided Molecular Design 22, 257–265.PubMedCrossRefGoogle Scholar
  18. 18.
    Talevi, A., Gavernet, L., and Bruno-Blanch, L. E. (2009) Combined virtual screening strategies. Current Computer-Aided Drug Design 5, 23–37.CrossRefGoogle Scholar
  19. 19.
    Warren, G. L., Andrews, C. W., Capelli, A.-M., Clarke, B., LaLonde, J., Lambert, M. H., Lindvall, M., Nevins, N., Semus, S. F., Senger, S., Tedesco, G., Wall, I. D., Woolven, J. M., Peishoff, C. E., and Head, M. S. (2006) A critical assessment of docking programs and scoring functions. Journal of Medicinal Chemistry 49, 5912–5931.PubMedCrossRefGoogle Scholar
  20. 20.
    Wilton, D., Willett, P., Lawson, K., and Mullier, G. (2003) Comparison of ranking methods for virtual screening in lead-discovery programs. Journal of Chemical Information and Computer Sciences 43, 469–474.PubMedCrossRefGoogle Scholar
  21. 21.
    Bajorath, J., (Ed.) (2004) Chemoinformatics Concepts, Methods and Tools for Drug Discovery, Humana Press, Totowa NJ.Google Scholar
  22. 22.
    Gasteiger, J., and Engel, T., (Eds.) (2003) Chemoinformatics: A Textbook, Wiley-VCH, Weinheim.Google Scholar
  23. 23.
    Leach, A. R., and Gillet, V. J. (2007) An Introduction to Chemoinformatics, 2nd edition, Kluwer, Dordrecht.CrossRefGoogle Scholar
  24. 24.
    Gasteiger, J., (Ed.) (2003) Handbook of Chemoinformatics, Wiley-VCH, Weinheim.Google Scholar
  25. 25.
    Johnson, M. A., and Maggiora, G. M., (Eds.) (1990) Concepts and Applications of Molecular Similarity. John Wiley, New York.Google Scholar
  26. 26.
    Willett, P. (2009) Similarity methods in chemoinformatics. Annual Review of Information Science and Technology 43, 3–71.CrossRefGoogle Scholar
  27. 27.
    Eckert, H., and Bajorath, J. (2007) Molecular similarity analysis in virtual screening: foundations, limitation and novel approaches. Drug Discovery Today 12, 225–233.PubMedCrossRefGoogle Scholar
  28. 28.
    Willett, P. (2006) Similarity-based virtual screening using 2D fingerprints. Drug Discovery Today 11, 1046–1053.PubMedCrossRefGoogle Scholar
  29. 29.
    Hagadone, T. R. (1992) Molecular substructure similarity searching – efficient retrieval in two-dimensional structure databases. Journal of Chemical Information and Computer Sciences 32, 515–521.CrossRefGoogle Scholar
  30. 30.
    Senger, S. (2009) Using Tversky similarity searches for core hopping: finding the needles in the haystack. Journal of Chemical Information and Modeling 49, 1514–1524.PubMedCrossRefGoogle Scholar
  31. 31.
    Willett, P. (1985) An algorithm for chemical superstructure searching. Journal of Chemical Information and Computer Sciences 25, 114–116.CrossRefGoogle Scholar
  32. 32.
    Carhart, R. E., Smith, D. H., and Venkataraghavan, R. (1985) Atom pairs as molecular-features in structure activity studies – definition and applications. Journal of Chemical Information and Computer Sciences 25, 64–73.CrossRefGoogle Scholar
  33. 33.
    Willett, P., Winterman, V., and Bawden, D. (1986) Implementation of nearest-neighbour searching in an online chemical structure search system. Journal of Chemical Information and Computer Sciences 26, 36–41.CrossRefGoogle Scholar
  34. 34.
    Adamson, G. W., and Bush, J. A. (1973) A method for the automatic classification of chemical structures. Information Storage and Retrieval 9, 561–568.CrossRefGoogle Scholar
  35. 35.
    Willett, P., Barnard, J. M., and Downs, G. M. (1998) Chemical similarity searching. Journal of Chemical Information and Computer Sciences 38, 983–996.CrossRefGoogle Scholar
  36. 36.
    Wilkins, C. L., and Randic, M. (1980) A graph theoretical approach to structure-property and structure-activity correlation. Theoretica Chimica Acta 58, 45–68.CrossRefGoogle Scholar
  37. 37.
    Patterson, D. E., Cramer, R. D., Ferguson, A. M., Clark, R. D., and Weinberger, L. E. (1996) Neighbourhood behaviour: a useful concept for validation of “molecular diversity” descriptors. Journal of Medicinal Chemistry 39, 3049–3059.PubMedCrossRefGoogle Scholar
  38. 38.
    Dixon, S. L., and Merz, K. M. (2001) One-dimensional molecular representations and similarity calculations: methodology and validation. Journal of Medicinal Chemistry 44, 3795–3809.PubMedCrossRefGoogle Scholar
  39. 39.
    Papadatos, G., Cooper, A. W. J., Kadirkamanathan, V., Macdonald, S. J. F., McLay, I. M., Pickett, S. D., Pritchard, J. M., Willett, P., and Gillet, V. J. (2009) Analysis of neighborhood behaviour in lead optimisation and array design. Journal of Chemical Information and Modeling 49, 195–208.PubMedCrossRefGoogle Scholar
  40. 40.
    Perekhodtsev, G. D. (2007) Neighbourhood behavior: validation of two-dimensional molecular similarity as a predictor of similar biological activities and docking scores. QSAR and Combinatorial Science 26, 346–351.CrossRefGoogle Scholar
  41. 41.
    Willett, P., and Winterman, V. (1986) A comparison of some measures of inter-molecular structural similarity. Quantitative Structure-Activity Relationships 5, 18–25.CrossRefGoogle Scholar
  42. 42.
    Willett, P. (1987) Similarity and Clustering in Chemical Information Systems, Research Studies Press, Letchworth.Google Scholar
  43. 43.
    Brown, R. D., and Martin, Y. C. (1996) Use of structure-activity data to compare structure-based clustering methods and descriptors for use in compound selection. Journal of Chemical Information and Computer Sciences 36, 572–584.CrossRefGoogle Scholar
  44. 44.
    Brown, R. D., and Martin, Y. C. (1997) The information content of 2D and 3D structural descriptors relevant to ligand-receptor binding. Journal of Chemical Information and Computer Sciences 37, 1–9.CrossRefGoogle Scholar
  45. 45.
    Martin, Y. C., Kofron, J. L., and Traphagen, L. M. (2002) Do structurally similar molecules have similar biological activities? Journal of Medicinal Chemistry 45, 4350–4358.PubMedCrossRefGoogle Scholar
  46. 46.
    Steffen, A., Kogej, T., Tyrchan, C., and Engkvist, O. (2009) Comparison of molecular fingerprint methods on the basis of biological profile data. Journal of Chemical Information and Modeling 49, 338–347.PubMedCrossRefGoogle Scholar
  47. 47.
    Sheridan, R. P., Feuston, B. P., Maiorov, V. N., and Kearsley, S. K. (2004) Similarity to molecules in the training set is a good discriminator for prediction accuracy in QSAR. Journal of Chemical Information and Computer Sciences 44, 1912–1928.PubMedCrossRefGoogle Scholar
  48. 48.
    He, L., and Jurs, P. C. (2005) Assessing the reliability of a QSAR model’s predictions. Journal of Molecular Graphics and Modelling 23, 503–523.PubMedCrossRefGoogle Scholar
  49. 49.
    Bostrom, J., Hogner, A., and Schmitt, S. (2006) Do structurally similar ligands bind in a similar fashion? Journal of Medicinal Chemistry 49, 6716–6725.PubMedCrossRefGoogle Scholar
  50. 50.
    Paolini, G. V., Shapland, R. H. B., van Hoorn, W. P., Mason, J. S., and Hopkins, A. L. (2006) Global mapping of pharmacological space. Nature Biotechnology 24, 805–815.PubMedCrossRefGoogle Scholar
  51. 51.
    Schuffenhauer, A., Floersheim, P., Acklin, P., and Jacoby, E. (2003) Similarity metrics for ligands reflecting the similarity of the target proteins. Journal of Chemical Information and Computer Sciences 43, 391–405.PubMedCrossRefGoogle Scholar
  52. 52.
    Hert, J., Keiser, M. J., Irwin, J. J., Oprea, T. I., and Shoichet, B. K. (2008) Quantifying the relationship among drug classes. Journal of Chemical Information and Modeling 48, 755–765.PubMedCrossRefGoogle Scholar
  53. 53.
    Keiser, M. J., Roth, B. L., Armbruster, B. N., Ernsberger, P., Irwin, J. J., and Shoichet, B. K. (2007) Relating protein pharmacology by ligand chemistry. Nature Biotechnology 25, 197–206.PubMedCrossRefGoogle Scholar
  54. 54.
    Cleves, A. E., and Jain, A. N. (2006) Robust ligand-based modeling of the biological targets of known drugs. Journal of Medicinal Chemistry 49, 2921–2938.PubMedCrossRefGoogle Scholar
  55. 55.
    Stahura, F. L., and Bajorath, J. (2002) Bio- and chemo-informatics beyond data management: crucial challenges and future opportunities. Drug Discovery Today 7, S41–S47.PubMedCrossRefGoogle Scholar
  56. 56.
    Kubinyi, H. (1998) Similarity and dissimilarity: a medicinal chemist’s view. Perspectives in Drug Discovery and Design 911, 225–232.CrossRefGoogle Scholar
  57. 57.
    Maggiora, G. M. (2006) On outliers and activity cliffs – why QSAR often disappoints. Journal of Chemical Information and Modeling 46, 1535.PubMedCrossRefGoogle Scholar
  58. 58.
    Peltason, L., and Bajorath, J. (2007) SAR index: quantifying the nature of structure-activity relationships. Journal of Medicinal Chemistry 50, 5571–5578.PubMedCrossRefGoogle Scholar
  59. 59.
    Todeschini, R., and Consonni, V. (2002) Handbook of Molecular Descriptors, Wiley-VCH, Weinheim.Google Scholar
  60. 60.
    Glen, R. C., and Adams, S. E. (2006) Similarity metrics and descriptor spaces – which combinations to choose? QSAR and Combinatorial Science 25, 1133–1142.CrossRefGoogle Scholar
  61. 61.
    Godden, J. W., Xue, L., Kitchen, D. B., Stahura, F. L., Schermerhorn, E. J., and Bajorath, J. (2002) Median partitioning: a novel method for the selection of representative subsets from large compound pools. Journal of Chemical Information and Computer Sciences 42, 885–893.PubMedCrossRefGoogle Scholar
  62. 62.
    Godden, J. W., Furr, J. R., Xue, L., Stahura, F. L., and Bajorath, J. (2004) Molecular similarity analysis and virtual screening by mapping of consensus positions in binary-tansformed chemical descriptor spaces with variable dimensionality. Journal of Chemical Information and Computer Sciences 44, 21–29.PubMedCrossRefGoogle Scholar
  63. 63.
    Kier, L. B., and Hall, H. L. (1986) Molecular Connectivity in Structure-Activity Analysis, Wiley, New York.Google Scholar
  64. 64.
    Lowell, H., Hall, H. L., and Kier, L. B. (2001) Issues in representation of molecular structure: the development of molecular connectivity. Journal of Molecular Graphics and Modelling 20, 4–18.CrossRefGoogle Scholar
  65. 65.
    Estrada, E., and Uriarte, E. (2001) Recent advances on the use of topological indices in drug discovery research. Current Medicinal Chemistry 8, 1573–1588.PubMedCrossRefGoogle Scholar
  66. 66.
    Raymond, J. W., and Willett, P. (2002) Effectiveness of graph-based and fingerprint-based similarity measures for virtual screening of 2D chemical structure databases. Journal of Computer-Aided Molecular Design 16, 59–71.PubMedCrossRefGoogle Scholar
  67. 67.
    Rarey, M., and Dixon, J. S. (1998) Feature trees: a new molecular similarity measure based on tree matching. Journal of Computer-Aided Molecular Design 12, 471–490.PubMedCrossRefGoogle Scholar
  68. 68.
    Rarey, M., and Stahl, M. (2001) Similarity searching in large combinatorial chemistry spaces. Journal of Computer-Aided Molecular Design 15, 497–520.PubMedCrossRefGoogle Scholar
  69. 69.
    Barker, E. J., Buttar, D., Cosgrove, D. A., Gardiner, E. J., Gillet, V. J., Kitts, P., and Willett, P. (2006) Scaffold-hopping using clique detection applied to reduced graphs. Journal of Chemical Information and Modeling 46, 503–511.PubMedCrossRefGoogle Scholar
  70. 70.
    Stiefl, N., Watson, I. A., Baumann, K., and Zaliani, A. (2006) ErG: 2D pharmacophore descriptions for scaffold hopping. Journal of Chemical Information and Modeling 46, 208–220.PubMedCrossRefGoogle Scholar
  71. 71.
    Mason, J. S., Morize, I., Menard, P. R., Cheney, D. L., Hulme, C., and Labaudiniere, R. F. (1999) New 4-point pharmacophore method for molecular similarity and diversity applications: overview of the method and applications, including a novel approach to the design of combinatorial libraries containing privileged substructures. Journal of Medicinal Chemistry 42, 3251–3264.PubMedCrossRefGoogle Scholar
  72. 72.
    Mount, J., Ruppert, J., Welch, W., and Jain, A. N. (1999) Icepick: a flexible surface-based system for molecular diversity. Journal of Medicinal Chemistry 42, 60–66.PubMedCrossRefGoogle Scholar
  73. 73.
    Cheeseright, T., Mackey, M., Rose, S., and Vinter, A. (2006) Molecular field extrema as descriptors of biological activity: definition and validation. Journal of Chemical Information and Modeling 46, 6650–6676.CrossRefGoogle Scholar
  74. 74.
    Mestres, J., Rohrer, D. C., and Maggiora, G. M. (1997) MIMIC: a molecular-field matching program. Exploiting applicability of molecular similarity approaches. Journal of Computational Chemistry 18, 934–954.CrossRefGoogle Scholar
  75. 75.
    Ballester, P. J., and Richards, W. G. (2007) Ultrafast shape recognition to search compound databases for similar molecular shapes. Journal of Computational Chemistry 28, 1711–1723.PubMedCrossRefGoogle Scholar
  76. 76.
    Rush, T. S., Grant, J. A., Mosyak, L., and Nicholls, A. (2005) A shape-based 3-D scaffold hopping method and its application to a bacterial protein-protein interaction. Journal of Medicinal Chemistry 48, 1489–1495.PubMedCrossRefGoogle Scholar
  77. 77.
    Barnard, J. M. (1993) Substructure searching methods – old and new. Journal of Chemical Information and Computer Sciences 33, 532–538.CrossRefGoogle Scholar
  78. 78.
    Brown, N. (2009) Chemoinformatics – an introduction for computer scientists. ACM Computing Surveys.Google Scholar
  79. 79.
    Adamson, G. W., Cowell, J., Lynch, M. F., McLure, A. H. W., Town, W. G., and Yapp, A. M. (1973) Strategic considerations in the design of screening systems for substructure searches of chemical structure files. Journal of Chemical Documentation 13, 153–157.CrossRefGoogle Scholar
  80. 80.
    Durant, J. L., Leland, B. A., Henry, D. R., and Nourse, J. G. (2002) Re-optimisation of MDL keys for use in drug discovery. Journal of Chemical Information and Modeling 42, 1273–1280.CrossRefGoogle Scholar
  81. 81.
    Hodes, L. (1976) Selection of descriptors according to discrimination and redundancy – application to chemical-structure searching. Journal of Chemical Information and Computer Sciences 16, 88–93.PubMedCrossRefGoogle Scholar
  82. 82.
    Bender, A., Mussa, H. Y., Glen, R. C., and Reiling, S. (2004) Molecular similarity searching using atom environments: information-based feature selection and a naive Bayesian classifier. Journal of Chemical Information and Computer Sciences 44, 170–178.PubMedCrossRefGoogle Scholar
  83. 83.
    Bender, A., Jenkins, J. L., Scheiber, J., Sukuru, S. C. K., Glick, M., and Davies, J. W. (2009) How similar are similarity searching methods? A principal components analysis of molecular descriptor space. Journal of Chemical Information and Modeling 49, 108–119.PubMedCrossRefGoogle Scholar
  84. 84.
    Ewing, T. J. A., Baber, J. C., and Feher, F. (2006) Novel 2D fingerprints for ligand-based virtual screening. Journal of Chemical Information and Modeling 46, 2423–2431.PubMedCrossRefGoogle Scholar
  85. 85.
    Fechner, U., Paetz, J., and Schneider, G. (2005) Comparison of three holographic fingerprint descriptors and their binary counterparts. QSAR and Combinatorial Science 24, 961–967.CrossRefGoogle Scholar
  86. 86.
    Hert, J., Willett, P., Wilton, D. J., Acklin, P., Azzaoui, K., Jacoby, E., and Schuffenhauer, A. (2004) Comparison of topological descriptors for similarity-based virtual screening using multiple bioactive reference structures. Organic and Biomolecular Chemistry 2, 3256–3266.PubMedCrossRefGoogle Scholar
  87. 87.
    Schneider, G., Neidhart, W., Giller, T., and Schmid, G. (1999) “Scaffold-hopping” by topological pharmacophore search: a contribution to virtual screening. Angewandte Chemie-International Edition 38, 2894–2896.CrossRefGoogle Scholar
  88. 88.
    Böhm, H.-J., Flohr, A., and Stahl, M. (2004) Scaffold hopping. Drug Discovery Today: Technologies 1, 217–224.CrossRefGoogle Scholar
  89. 89.
    Brown, N., and Jacoby, E. (2006) On scaffolds and hopping in medicinal chemistry. Mini-Reviews in Medicinal Chemistry 6, 1217–1229.PubMedCrossRefGoogle Scholar
  90. 90.
    Schneider, G., Schneider, P., and Renner, S. (2006) Scaffold-hopping: how far can you jump? QSAR and Combinatorial Science 25, 1162–1171.CrossRefGoogle Scholar
  91. 91.
    Martin, Y. C., and Muchmore, S. (2009) Beyond QSAR: lead hopping to different structures. QSAR & Combinatorial Science 28, 797–801.CrossRefGoogle Scholar
  92. 92.
    Eckert, H., and Bajorath, J. (2006) Determination and mapping of activity-specific descriptor value ranges for the identification of active compounds. Journal of Medicinal Chemistry 49, 2284–2293.PubMedCrossRefGoogle Scholar
  93. 93.
    Xue, L., Godden, J. W., Stahura, F. L., and Bajorath, J. (2003) Design and evaluation of a molecular fingerprint involving the transformation of property descriptor values into a binary classification scheme. Journal of Chemical Information and Computer Sciences 43, 1151–1157.PubMedCrossRefGoogle Scholar
  94. 94.
    Briem, H., and Lessel, U. F. (2000) In vitro and in silico affinity fingerprints: finding similarities beyond structural classes. Perspectives in Drug Discovery and Design 20, 231–244.CrossRefGoogle Scholar
  95. 95.
    Kauvar, L. M., Higgins, D. L., Villar, H. O., Sportsman, J. R., Engqvist-Goldstein, A., Bukar, R., Bauer, K. E., Dilley, H., and Rocke, D. M. (1995) Predicting ligand binding to proteins by affinity fingerprinting. Chemistry & Biology 2, 107–118.CrossRefGoogle Scholar
  96. 96.
    Ormerod, A., Willett, P., and Bawden, D. (1989) Comparison of fragment weighting schemes for substructural analysis, Quantitative Structure-Activity Relationships 8, 115–129.CrossRefGoogle Scholar
  97. 97.
    Goldman, B. B., and Walters, W. P. (2006) Machine learning in computational chemistry. Annual Reports in Computational Chemistry 2, 127–140.CrossRefGoogle Scholar
  98. 98.
    Moock, T. E., Grier, D. L., Hounshell, W. D., Grethe, G., Cronin, K., Nourse, J. G., and Theodosiou, J. (1988) Similarity searching in the organic reaction domain. Tetrahedron Computer Methodology 1, 117–128.CrossRefGoogle Scholar
  99. 99.
    Downs, G. M., Poirrette, A. R., Walsh, P., and Willett, P. (1993) Evaluation of similarity searching methods using activity and toxicity data, in Chemical Structures 2. The International Language of Chemistry. (Warr, W. A., Ed.), pp 409–421, Springer Verlag, Berlin.Google Scholar
  100. 100.
    Azencott, C.-A., Ksikes, A., Swamidass, S. J., Chen, J. H., Ralaivola, L., and Baldi, P. (2007) One- to four-dimensional kernels for virtual screening and the prediction of physical, chemical and biological properties. Journal of Chemical Information and Modeling 47, 965–974.PubMedCrossRefGoogle Scholar
  101. 101.
    Chen, X., and Reynolds, C. H. (2002) Performance of similarity measures in 2D fragment-based similarity searching: comparison of structural descriptors and similarity coefficients. Journal of Chemical Information and Computer Sciences 42, 1407–1414.PubMedCrossRefGoogle Scholar
  102. 102.
    Olah, M., Bologa, C., and Oprea, T. I. (2004) An automated PLS search for biologically relevant QSAR descriptors. Journal of Computer-Aided Molecular Design 18, 437–449.PubMedCrossRefGoogle Scholar
  103. 103.
    Arif, S. M., Holliday, J. D., and Willett, P. (2009) Analysis and use of fragment occurrence data in similarity-based virtual screening. Journal of Computer-Aided Molecular Design 23, 655–668.PubMedCrossRefGoogle Scholar
  104. 104.
    Everitt, B. S., Landau, S., and Leese, M. (2001) Cluster Analysis, 4th edition, Edward Arnold, London.Google Scholar
  105. 105.
    Gower, J. C. (1982) Measures of similarity, dissimilarity and distance, in Encyclopaedia of Statistical Sciences (Kotz, S., Johnson, N. L., and Read, C. B., Eds.), pp 397–405, John Wiley, Chichester.Google Scholar
  106. 106.
    Hubálek, Z. (1982) Coefficients of association and similarity, based on binary (presence-absence) data: an evaluation. Biological Reviews of the Cambridge Philosophical Society 57, 669–689.CrossRefGoogle Scholar
  107. 107.
    Flower, D. R. (1988) On the properties of bit string based measures of chemical similarity. Journal of Chemical Information and Computer Sciences 38, 379–386.Google Scholar
  108. 108.
    Dixon, S. L., and Koehler, R. T. (1999) The hidden component of size in two-dimensional fragment descriptors: side effects on sampling in bioactive libraries. Journal of Medicinal Chemistry 42, 2887–2900.PubMedCrossRefGoogle Scholar
  109. 109.
    Fligner, M. A., Verducci, J. S., and Blower, P. E. (2002) A modification of the Jaccard-Tanimoto similarity index for diverse selection of chemical compounds using binary strings. Technometrics 44, 110–119.CrossRefGoogle Scholar
  110. 110.
    Godden, J. W., Xue, L., and Bajorath, J. (2000) Combinatorial preferences affect molecular similarity/diversity calculations using binary fingerprints and Tanimoto coefficients. Journal of Chemical Information and Computer Sciences 40, 163–166.PubMedCrossRefGoogle Scholar
  111. 111.
    Tversky, A. (1977) Features of similarity. Psychological Review 84, 327–352.CrossRefGoogle Scholar
  112. 112.
    Bradshaw, J. (1997) Introduction to Tversky similarity measure, in MUG ‘97 – 11th Annual Daylight User Group Meeting Laguna Beach CA.Google Scholar
  113. 113.
    Maggiora, G. M., Mestres, J., Hagadone, T. R., and Lajiness, M. S. (1997) Asymmetric similarity and molecular diversity, in 213th National Meeting of the American Chemical Society, April 13–17, 1997, San Francisco, CA.Google Scholar
  114. 114.
    Chen, X., and Brown, F. K. (2006) Asymmetry of chemical similarity. ChemMedChem 2, 180–182.CrossRefGoogle Scholar
  115. 115.
    Wang, Y., Eckert, H., and Bajorath, J. (2007) Apparent asymmetry in fingerprint similarity searching is a direct consequence of differences in bit densities and molecular size. ChemMedChem 2, 1037–1042.PubMedCrossRefGoogle Scholar
  116. 116.
    Wang, Y., and Bajorath, J. (2008) Balancing the influence of molecular complexity on fingerprint similarity searching. Journal of Chemical Information and Modeling 48, 75–84.PubMedCrossRefGoogle Scholar
  117. 117.
    Wang, Y., and Bajorath, J. (2009) Development of a compound-class directed similarity coefficient that accounts for molecular complexity effects in fingerprint searching. Journal of Chemical Information and Modeling 49, 1369–1376.PubMedCrossRefGoogle Scholar
  118. 118.
    Varin, T., Bureau, R., Mueller, C., and Willett, P. (2009) Clustering files of chemical structures using the Székely-Rizzo generalisation of Ward’s method. Journal of Molecular Graphics and Modelling 28, 187–195.PubMedCrossRefGoogle Scholar
  119. 119.
    Gower, J. C., and Legendre, P. (1986) Metric and Euclidean properties of dissimilarity coefficients. Journal of Classification 5, 5–48.CrossRefGoogle Scholar
  120. 120.
    Edgar, S. J., Holliday, J. D., and Willett, P. (2000) Effectiveness of retrieval in similarity searches of chemical databases: a review of performance measures. Journal of Molecular Graphics and Modelling 18, 343–357.PubMedCrossRefGoogle Scholar
  121. 121.
    Willett, P. (2004) The evaluation of molecular similarity and molecular diversity methods using biological activity data. Methods in Molecular Biology 275, 51–63.PubMedCrossRefGoogle Scholar
  122. 122.
    Kearsley, S. K., Sallamack, S., Fluder, E. M., Andose, J. D., Mosley, R. T., and Sheridan, R. P. (1996) Chemical similarity using physicochemical property descriptors. Journal of Chemical Information and Computer Sciences 36, 118–127.CrossRefGoogle Scholar
  123. 123.
    Hert, J., Willett, P., Wilton, D. J., Acklin, P., Azzaoui, K., Jacoby, E., and Schuffenhauer, A. (2004) Comparison of fingerprint-based methods for virtual screening using multiple bioactive reference structures. Journal of Chemical Information and Computer Sciences 44, 1177–1185.PubMedCrossRefGoogle Scholar
  124. 124.
    Cuissart, B., Touffet, F., Crémilleux, B., Bureau, R., and Rault, S. (2002) The maximum common substructure as a molecular depiction in a supervised classification context: experiments in quantitative structure/biodegradability relationships. Journal of Chemical Information and Computer Sciences 42, 1043–1052.PubMedCrossRefGoogle Scholar
  125. 125.
    Triballeau, N., Acher, F., Brabet, I., Pin, J.-P., and Bertrand, H.-O. (2005) Virtual screening workflow development guided by the “Receiver Operating Characteristic” curve approach. Application to high-throughput docking on metabotropic glutamate receptor type 4. Journal of Medicinal Chemistry 48, 2534–2547.PubMedCrossRefGoogle Scholar
  126. 126.
    Truchon, J.-F., and Bayly, C. I. (2007) Evaluating virtual screening methods: good and bad metrics for the “early recognition” problem. Journal of Chemical Information and Modeling 47, 488–508.PubMedCrossRefGoogle Scholar
  127. 127.
    Jain, A. N., and Nicholls, A. (2008) Recommendations for evaluation of computational methods. Journal of Computer-Aided Molecular Design 22, 133–139.PubMedCrossRefGoogle Scholar
  128. 128.
    Nicholls, A. (2008) What do we know and when do we know it? Journal of Computer-Aided Molecular Design 22, 239–255.PubMedCrossRefGoogle Scholar
  129. 129.
    Good, A. C., Hermsmeier, M. A., and Hindle, S. A. (2004) Measuring CAMD technique performance: a virtual screening case study in the design of validation experiments. Journal of Computer-Aided Molecular Design 18, 529–536.PubMedCrossRefGoogle Scholar
  130. 130.
    Willett, P. (2006) Data fusion in ligand-based virtual screening. QSAR and Combinatorial Science 25, 1143–1152.CrossRefGoogle Scholar
  131. 131.
    Feher, M. (2006) Consensus scoring for protein-ligand interactions. Drug Discovery Today 11, 421–428.PubMedCrossRefGoogle Scholar
  132. 132.
    Ginn, C. M. R., Turner, D. B., Willett, P., Ferguson, A. M., and Heritage, T. W. (1997) Similarity searching in files of three-dimensional chemical structures: evaluation of the EVA descriptor and combination of rankings using data fusion. Journal of Chemical Information and Computer Sciences 37, 23–37.CrossRefGoogle Scholar
  133. 133.
    Ginn, C. M. R., Willett, P., and Bradshaw, J. (2000) Combination of molecular similarity measures using data fusion. Perspectives in Drug Discovery and Design 20, 1–16.CrossRefGoogle Scholar
  134. 134.
    Sheridan, R. P., Miller, M. D., Underwood, D. J., and Kearsley, S. K. (1996) Chemical similarity using geometric atom pair descriptors. Journal of Chemical Information and Computer Sciences 36, 128–136.CrossRefGoogle Scholar
  135. 135.
    Holliday, J. D., Hu, C.-Y., and Willett, P. (2002) Grouping of coefficients for the calculation of inter-molecular similarity and dissimilarity using 2D fragment bit-strings. Combinatorial Chemistry and High-Throughput Screening 5, 155–166.PubMedGoogle Scholar
  136. 136.
    Salim, N., Holliday, J. D., and Willett, P. (2003) Combination of fingerprint-based similarity coefficients using data fusion. Journal of Chemical Information and Computer Sciences 43, 435–442.PubMedCrossRefGoogle Scholar
  137. 137.
    Whittle, M., Gillet, V. J., Willett, P., Alex, A., and Loesel, J. (2004) Enhancing the effectiveness of virtual screening by fusing nearest neighbor lists: a comparison of similarity coefficients. Journal of Chemical Information and Computer Sciences 44, 1840–1848.PubMedCrossRefGoogle Scholar
  138. 138.
    Xue, L., Stahura, F. L., Godden, J. W., and Bajorath, J. (2001) Fingerprint scaling increases the probability of identifying molecules with similar activity in virtual screening calculations. Journal of Chemical Information and Computer Sciences 41, 746–753.PubMedCrossRefGoogle Scholar
  139. 139.
    Williams, C. (2006) Reverse fingerprinting, similarity searching by group fusion and fingerprint bit importance. Molecular Diversity 10, 311–332.PubMedCrossRefGoogle Scholar
  140. 140.
    Zhang, Q., and Muegge, I. (2006) Scaffold hopping through virtual screening using 2D and 3D similarity descriptors: ranking, voting, and consensus scoring. Journal of Medicinal Chemistry 49, 1536–1548.PubMedCrossRefGoogle Scholar
  141. 141.
    Hert, J., Willett, P., Wilton, D. J., Acklin, P., Azzaoui, K., Jacoby, E., and Schuffenhauer, A. (2005) Enhancing the effectiveness of similarity-based virtual screening using nearest-neighbour information. Journal of Medicinal Chemistry 48, 7049–7054.PubMedCrossRefGoogle Scholar
  142. 142.
    Hert, J., Willett, P., Wilton, D. J., Acklin, P., Azzaoui, K., Jacoby, E., and Schuffenhauer, A. (2006) New methods for ligand-based virtual screening: use of data-fusion and machine-learning techniques to enhance the effectiveness of similarity searching. Journal of Chemical Information and Modeling 46, 462–470.PubMedCrossRefGoogle Scholar
  143. 143.
    Gardiner, E. J., Gillet, V. J., Haranczyk, M., Hert, J., Holliday, J. D., Malim, N., Patel, Y., and Willett, P. (2009) Turbo similarity searching: effect of fingerprint and dataset on virtual-screening performance. Statistical Analysis and Data Mining 2, 103–114.CrossRefGoogle Scholar
  144. 144.
    Baber, J. C., Shirley, W. A., Gao, Y., and Feher, M. (2006) The use of consensus scoring in ligand-based virtual screening. Journal of Chemical Information and Modelling 46, 277–288.CrossRefGoogle Scholar
  145. 145.
    Whittle, M., Gillet, V. J., Willett, P., and Loesel, J. (2006) Analysis of data fusion methods in virtual screening: theoretical model. Journal of Chemical Information and Modeling 46, 2193–2205.PubMedCrossRefGoogle Scholar
  146. 146.
    Whittle, M., Gillet, V. J., Willett, P., and Loesel, J. (2006) Analysis of data fusion methods in virtual screening: similarity and group fusion. Journal of Chemical Information and Modeling 46, 2206–2219.PubMedCrossRefGoogle Scholar
  147. 147.
    Cramer, R. D., Redl, G., and Berkoff, C. E. (1974) Substructural analysis. A novel approach to the problem of drug design. Journal of Medicinal Chemistry 17, 533–535.PubMedCrossRefGoogle Scholar
  148. 148.
    Capelli, A. M., Feriani, A., Tedesco, G., and Pozzan, A. (2006) Generation of a focused set of GSK compounds biased toward ligand-gated ion-channel ligands. Journal of Chemical Information and Modeling 46, 659–664.PubMedCrossRefGoogle Scholar
  149. 149.
    Cosgrove, D. A., and Willett, P. (1998) SLASH: a program for analysing the functional groups in molecules. Journal of Molecular Graphics and Modelling 16, 19–32.PubMedCrossRefGoogle Scholar
  150. 150.
    Medina-Franco, J. L., Petit, J., and Maggiora, G. M. (2006) Hierarchical strategy for identifying active chemotype classes in compound databases. Chemical Biology & Drug Design 67, 395–408.CrossRefGoogle Scholar
  151. 151.
    Schreyer, S. K., Parker, C. N., and Maggiora, G. M. (2004) Data shaving: a focused screening approach. Journal of Chemical Information and Computer Sciences 44, 470–479.PubMedCrossRefGoogle Scholar
  152. 152.
    Hassan, M., Brown, R. D., Varma-O’Brien, S., and Rogers, D. (2006) Cheminformatics analysis and learning in a data pipelining environment. Molecular Diversity 10, 283–299.PubMedCrossRefGoogle Scholar
  153. 153.
    Rogers, D., Brown, R. D., and Hahn, M. (2005) Using extended-connectivity fingerprints with Laplacian-modified Bayesian analysis in high-throughput screening follow-up. Journal of Biomolecular Screening 10, 682–686.PubMedCrossRefGoogle Scholar
  154. 154.
    Xia, X. Y., Maliski, E. G., Gallant, P., and Rogers, D. (2004) Classification of kinase inhibitors using a Bayesian model. Journal of Medicinal Chemistry 47, 4463–4470.PubMedCrossRefGoogle Scholar
  155. 155.
    Bender, A., Mussa, H. Y., Glen, R. C., and Reiling, S. (2004) Similarity searching of chemical databases using atom environment descriptors: evaluation of performance. Journal of Chemical Information and Computer Sciences 44, 1708–1718.PubMedCrossRefGoogle Scholar
  156. 156.
    Vogt, M., Nisius, B., and Bajorath, J. (2009) Predicting the similarity search performance of fingerprints and their combination with molecular property descriptors using probabilistic and information theoretic modeling. Statistical Analysis and Data Mining 2, 123–134.CrossRefGoogle Scholar
  157. 157.
    Vogt, M., and Bajorath, J. (2008) Bayesian screening for active compounds in high-dimensional chemical spaces combining property descriptors and molecular fingerprints. Chemical and Biological Drug Design 71, 8–14.CrossRefGoogle Scholar
  158. 158.
    Wang, Y., and Bajorath, J. (2008) Bit silencing in fingerprints enables the derivation of compound class-directed similarity metrics. Journal of Chemical Information and Modeling 48, 1754–1759.PubMedCrossRefGoogle Scholar
  159. 159.
    Vogt, I., and Bajorath, J. (2007) Analysis of a high-throughput screening data set using potency-scaled molecular similarity algorithms. Journal of Chemical Information and Modeling 47, 367–375.PubMedCrossRefGoogle Scholar
  160. 160.
    Geppert, H., Horvath, T., Gartner, T., Wrobel, S., and Bajorath, J. (2008) Support-vector-machine-based ranking significantly improves the effectiveness of similarity searching using 2D fingerprints and multiple reference compounds. Journal of Chemical Information and Modeling 48, 742–746.PubMedCrossRefGoogle Scholar
  161. 161.
    Shemetulskis, N. E., Weininger, D., Blankey, C. J., Yang, J. J., and Humblet, C. (1996) Stigmata: an algorithm to determine structural commonalities in diverse datasets. Journal of Chemical Information and Computer Sciences 36, 862–871.PubMedCrossRefGoogle Scholar
  162. 162.
    Tovar, A., Eckert, H., and Bajorath, J. (2007) Comparison of 2D fingerprint methods for multiple-template similarity searching on compound activity classes of increasing structural diversity. ChemMedChem 2, 208–217.PubMedCrossRefGoogle Scholar
  163. 163.
    Hessler, G., Zimmermann, M., Matter, H., Evers, A., Naumann, T., Lengauer, T., and Rarey, M. (2005) Multiple-ligand-based virtual screening: methods and applications of the MTree approach. Journal of Medicinal Chemistry 48, 6575–6584.PubMedCrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC 2010

Authors and Affiliations

  • Peter Willett
    • 1
  1. 1.Department of Information StudiesThe University of SheffieldSheffieldUK

Personalised recommendations