Molecular Similarity Measures

  • Gerald M. Maggiora
  • Veerabahu Shanmugasundaram
Part of the Methods in Molecular Biology book series (MIMB, volume 672)


Molecular similarity is a pervasive concept in chemistry. It is essential to many aspects of chemical reasoning and analysis and is perhaps the fundamental assumption underlying medicinal chemistry. Dissimilarity, the complement of similarity, also plays a major role in a growing number of applications of molecular diversity in combinatorial chemistry, high-throughput screening, and related fields. How molecular information is represented, called the representation problem, is important to the type of molecular similarity analysis (MSA) that can be carried out in any given situation. In this work, four types of mathematical structure are used to represent molecular information: sets, graphs, vectors, and functions. Molecular similarity is a pairwise relationship that induces structure into sets of molecules, giving rise to the concept of chemical space. Although all three concepts – molecular similarity, molecular representation, and chemical space – are treated in this chapter, the emphasis is on molecular similarity measures. Similarity measures, also called similarity coefficients or indices, are functions that map pairs of compatible molecular representations that are of the same mathematical form into real numbers usually, but not always, lying on the unit interval. This chapter presents a somewhat pedagogical discussion of many types of molecular similarity measures, their strengths and limitations, and their relationship to one another. An expanded account of the material on chemical spaces presented in the first edition of this book is also provided. It includes a discussion of the topography of activity landscapes and the role that activity cliffs in these landscapes play in structure–activity studies.

Key words

Molecular similarity Molecular similarity analyses Dissimilarity Activity landscapes 



The authors would like to thank Tom Doman for his constructive comments on the original version of this manuscript, and Mark Johnson, Mic Lajiness, John Van Drie, and Tudor Oprea for helpful discussions. Special thanks are given to Jurgen Bajorath and Jose Medina-Franco, for providing several figures and for their helpful comments.


  1. 1.
    Rouvray, D. (1990) The evolution of the concept of molecular similarity. In Concepts and Applications of Molecular Similarity, M.A. Johnson and G.M. Maggiora, Eds., Wiley, New York, Chapter 2.Google Scholar
  2. 2.
    Sheridan, R.P. and Kearsley, S.K. (2002) Why do we need so many chemical similarity search methods? Drug Discovery Today 7, 903–911.PubMedCrossRefGoogle Scholar
  3. 3.
    Willett, P. (1987) Similarity and Clustering in Chemical Information Systems. Research Studies Press, Letchworth.Google Scholar
  4. 4.
    Johnson, M.A. and Maggiora, G.M., Eds. (1990) Concepts and Applications of Molecular Similarity. Wiley, New York.Google Scholar
  5. 5.
    Dean, P.M., Ed. (1994) Molecular Similarity in Drug Design. Chapman & Hall, Glasgow.Google Scholar
  6. 6.
    Tversky, A. (1977) Features of similarity. Pyschol. Rev. 84, 327–352.CrossRefGoogle Scholar
  7. 7.
    Chen, X. and Brown, F.K. (2007) Asymmetry of chemical similarity. Chem. Med. Chem. 2, 180–182.PubMedGoogle Scholar
  8. 8.
    Willett, P., Barnard, J.P., and Downs, G.M. (1998) Chemical similarity searching. J. Chem. Inf. Comput. Sci. 38, 983–996.CrossRefGoogle Scholar
  9. 9.
    Bender, A. and Glen, R.C. ( 2004) Molecular similarity: A key technique in molecular informatics. Org. Biomol. Chem. 2, 3204–3218.PubMedCrossRefGoogle Scholar
  10. 10.
    Johnson, M.A. (1989) A review and examination of mathematical spaces underlying molecular similarity analysis. J. Math. Chem. 3, 117–145.CrossRefGoogle Scholar
  11. 11.
    Borg, I. and Groenen, P. (1997) Modern Multidimensional Scaling. Springer, New York.Google Scholar
  12. 12.
    Jolliffe, I.T. (2002) Principal Component Analysis (Second Edition). Springer, New York.Google Scholar
  13. 13.
    Domine, D., Devillers, J., Chastrette, M., and Karcher, W. (1993). Non-linear mapping for structure-activity and structure-property modeling. J. Chemometrics 7, 227–242.CrossRefGoogle Scholar
  14. 14.
    Rush, J.A. (1999) Cell-based methods for sampling high-dimensional spaces. In Rational Drug Design, Truhlar, D.G., Howe, W.J., et al., Eds., Springer, New York, pp. 73–79.CrossRefGoogle Scholar
  15. 15.
    Rohrbaugh, R.H. and Jurs, P.C. (1987) Descriptions of molecular shape applied in studies of structure/activity and structure/property relationships. Anal. Chim. Acta 199, 99–109.CrossRefGoogle Scholar
  16. 16.
    Verloop, A. (1987) The STERIMOL Approach to Drug Design. Marcel Dekker, New York.Google Scholar
  17. 17.
    Mulliken, R.S. (1955) Electronic population analysis on LCAO-MO molecular wave functions. I. J. Chem. Phys. 23, 1833–1840.CrossRefGoogle Scholar
  18. 18.
    Stanton, D.T.; Jurs, P.C. (1990) Development and use of charged partial surface area structural descriptors in computer-assisted quantitative structure-property relationship studies. Anal. Chem. 62, 2323–2329.CrossRefGoogle Scholar
  19. 19.
    Kier, L.B. (1989) An index of molecular flexibility from kappa shape attributes. Quant. Struct.-Act. Relat. 8, 221–224.CrossRefGoogle Scholar
  20. 20.
    Kvasnička, V. and Pospíchal, J. (1989) Two metrics for a graph-theoretical model of organic chemistry. J. Math. Chem. 3, 161–191.CrossRefGoogle Scholar
  21. 21.
    Kvasnička, V. and Pospíchal, J. (1991) Chemical and reaction metrics for graph-theoretical model of organic chemistry. J. Mol. Struct. (Theochem.) 227, 17–42.CrossRefGoogle Scholar
  22. 22.
    Randić, M. (1992) Representation of molecular graphs by basic graphs. J. Chem. Inf. Comput. Sci. 32, 57–69.CrossRefGoogle Scholar
  23. 23.
    Baskin, I.I., Skvortsova, M.I., Stankevich, I.V., and Zefirov, N.S. (1995) On the basis of invariants of labeled molecular graphs. J. Chem. Inf. Comput. Sci. 35, 527–531.CrossRefGoogle Scholar
  24. 24.
    Skvortsova, M.I., Baskin, I.I., Stankevich, I.V., Palyulin, V.A., and Zefirov, N.S. (1998) Molecular similarity. I. Analytical description of the set of graph similarity measures. J. Chem. Inf. Comput. Sci. 38, 785–790.Google Scholar
  25. 25.
    Ginn, C.M.R., Willett, P., and Bradshaw, J. (2000) Combination of molecular similarity measures using data fusion. Perspec. Drug Disc. Design 20, 1–16.CrossRefGoogle Scholar
  26. 26.
    Hert, J., Willett, P., Wilton, D.J., Acklin, P., Azzaoui, K., Jacoby, E., and Schuffenhauer, A. (2004) Comparison of fingerprint-based methods for virtual screening using multiple bioactive reference structures. J. Chem. Inf. Comput. Sci. 44, 1177–1185.PubMedCrossRefGoogle Scholar
  27. 27.
    Whittle, M., Gillet, V.J., Willett, P., Alexander, A., and Loesel, J. (2004) Enhancing the effectiveness of virtual screening by fusing nearest-neighbor lists: A comparison of similarity coefficients. J. Chem. Inf. Comput. Sci. 44, 1840–1848.PubMedCrossRefGoogle Scholar
  28. 28.
    Whittle, M., Gillet, V.J., Willett, P., and Loesel, J. (2006) Analysis of data fusion methods in virtual screening: Similarity and group fusion. J. Chem. Inf. Model. 46, 2206–2219.PubMedCrossRefGoogle Scholar
  29. 29.
    Mestres, J., Rohrer, D.C., and Maggiora, G.M. (1999) A molecular-field-based similarity study of non-nucleoside HIV-1 reverse transcriptase inhibitors. J. Comput.-Aided Mol. Design 13, 79–93.CrossRefGoogle Scholar
  30. 30.
    Trinajstić, N. (1992) Chemical Graph Theory. CRC Press, Boca Raton, Florida.Google Scholar
  31. 31.
    Harary, F. (1969) Graph Theory. Addison-Wesley Publishing Company, Reading, Massachusetts.Google Scholar
  32. 32.
    Raymond, J.W. and Willett, P. (2002) Maximum common subgraph isomorphism algorithms for the matching of chemical structures. J. Comput.-Aided Mol. Design 16, 521–533.CrossRefGoogle Scholar
  33. 33.
    Mason, J.S., Morize, I., Menard, P.R., Cheney, D.L., Hulme, C., and Labaudiniere, R.F. (1999) New 4-point pharmacophore method for molecular similarity and diversity applications: overview of the method and applications, including a novel approach to the design of combinatorial libraries containing privileged substructures. J. Med Chem. 42, 3251–3264.PubMedCrossRefGoogle Scholar
  34. 34.
    Devillers, J. and Balaban, A.T., Eds. (1999) Topological Indices and Related Descriptors in QSAR and QSPR. Gordon and Breach Science Publishers, Amsterdam, The Netherlands.Google Scholar
  35. 35.
    Pearlman, R.S. and Smith, K.M. (1998) Novel software tools for chemical diversity. Perspec. Drug Disc. Design 9/10/11, 339–353.CrossRefGoogle Scholar
  36. 36.
    Halmos, P.R. (1958) Finite-Dimensional Vector Spaces, Second Edition. D. Van Nostrand Company, Inc., Princeton, New Jersey.Google Scholar
  37. 37.
    Mestres, J., Rohrer, D.C., and Maggiora, G.M. (1997) MIMIC: A molecular-field matching program. Exploiting applicability of molecular similarity approaches. J. Comput. Chem. 18, 934–954.CrossRefGoogle Scholar
  38. 38.
    Thorner, D.A., Willett, P., Wright, P.M., and Taylor, R. (1997) Similarity searching in files of three-dimensional chemical structures: Representation and searching of molecular electrostatic potentials using field-graphs. J. Comput.-Aided Mol. Design 11, 163–174.CrossRefGoogle Scholar
  39. 39.
    Du, Q., Arteca, G.A., and Mezey, P.G. (1997) Heuristic lipophilicity potential for computer-aided rational drug design. J. Comput.-Aided Mol. Design 11, 503–515.CrossRefGoogle Scholar
  40. 40.
    Oden, J.T. and Demkowicz, L.F. (1996) Applied Functional Analysis. CRC Press, Boca Raton, Florida.Google Scholar
  41. 41.
    Petke, J.D. (1993) Cumulative and discrete similarity analysis of electrostatic potentials and fields. J. Comput. Chem. 14, 928–933.CrossRefGoogle Scholar
  42. 42.
    Cramer, R.D., Patterson, D.E., and Bunce, J.D. (1988) Comparative molecular field analysis (CoMFA). 1. Effect of shape on binding of steroids to carrier proteins. J. Amer. Chem. Soc., 110, 5959–5967.Google Scholar
  43. 43.
    Bandemer, H. and Näther, W. (1992) Fuzzy Data Analysis. Kluwer Academic Publishers, Dordrecht, The Netherlands.CrossRefGoogle Scholar
  44. 44.
    Kaufmann, A. and Gupta, M.M. (1985) An Introduction to Fuzzy ArithmeticTheory and Applications. Van Nostrand Reinhold, New York.Google Scholar
  45. 45.
    McGregor, J. and Willett, P. (1981) Use of a maximal common subgraph algorithm in the automatic identification of the ostensible bond changes occurring in chemical reactions. J. Chem. Inf. Comput. Sci. 21, 137–140.CrossRefGoogle Scholar
  46. 46.
    Johnson, M. (1985) Relating metrics, lines, and variables defined on graphs to problems in medicinal chemistry. In Graph Theory and its Applications to Algorithms and Computer Science, Y. Alavi et al., Eds., Wiley, New York, pp.457–470.Google Scholar
  47. 47.
    Hagadone, T.R. (1992) Molecular substructure similarity searching: Efficient retrieval in two-dimensional structure databases. J. Chem. Inf. Comput. Sci. 32, 515–521.CrossRefGoogle Scholar
  48. 48.
    Rusinko, A., Farmen, M.W., Lambert, C.G., and Young, S.S. (1997) SCAM: Statistical classification of activities of molecules using recursive partitioning. 213th ACS Natl. Meeting, San Francisco, CA, CINF 068.Google Scholar
  49. 49.
    James, C.A., Weininger, D., and Delany, J. (2002) Daylight Theory Manual. Daylight Chemical Information Systems, Inc.Google Scholar
  50. 50.
    Kanerva, P. (1990) Sparse Distributed Memory. MIT Press, Cambridge, Massachusetts, pp. 26–27.Google Scholar
  51. 51.
    Klir, G.J. and Yuan, B. (1995) Fuzzy Sets and Fuzzy Logic: Theory and Applications. Prentice Hall PTR, Upper Saddle River, New Jersey.Google Scholar
  52. 52.
    Miyamoto, S. (1990) Fuzzy Sets in Information Retrieval and Cluster Analysis. Kluwer Academic Publishers, Dordrecht, The Netherlands.CrossRefGoogle Scholar
  53. 53.
    Maggiora, G.M., Petke, J.D., and Mestres, J. (2002) A general analysis of field-based molecular similarity indices. J. Math. Chem. 31, 251–270.CrossRefGoogle Scholar
  54. 54.
    Hurst, T. and Heritage, T. (1997) HQSAR – A highly predictive QSAR technique based on molecular holograms. 213th ACS Natl. Meeting, San Francisco, CA, CINF 019.Google Scholar
  55. 55.
    Schneider, G., Neidhart, W., Giller, T., and Schmid, G. (1999) “Scaffold-hopping” by topological pharmacophore search: A contribution to virtual screening. Angew. Chem. Int. Ed. 38, 2894–2896.CrossRefGoogle Scholar
  56. 56.
    Xue, L., Godden, J.W., and Bajorath, J. (1999) Database searching for compounds with similar biological activity using short binary bit string representations of molecules. J. Chem. Inf. Comput. Sci. 39, 881–886.PubMedCrossRefGoogle Scholar
  57. 57.
    Wikipedia website, (Last accessed October 22, 2009).
  58. 58.
    Hyvarinen, A., Karhunen, J., and Oja, E. (2001) Independent Component Analysis. Wiley, New York.CrossRefGoogle Scholar
  59. 59.
    Kay, D.C. (1988) Theory and Problems of Tensor Calculus, Schaum’s Outline Series. McGraw-Hill, New York.Google Scholar
  60. 60.
    Hodgkin, E.E. and Richards, W.G. (1987) Molecular similarity based on electrostatic potential and electric fields. Int. J. Quantum Chem.: Quantum Biol. Symp. 14, 105–110.CrossRefGoogle Scholar
  61. 61.
    Good, A.C. and Richards, W.G. (1998) Explicit Calculation of 3D molecular similarity. Perspec. Drug Disc. Design 9/10/11, 321–338.CrossRefGoogle Scholar
  62. 62.
    Lemmen, C. and Lengauer, T. (2000) Computational methods for the structural alignment of molecules. J. Comput.-Aided Mol. Design 14, 215–232.CrossRefGoogle Scholar
  63. 63.
    Güner, O.F., Ed. (2000) Pharmacophore Perception, Development and Use in Drug Design. International University Line, La Jolla.Google Scholar
  64. 64.
    Mansfield, M.L., Covell, D.G., and Jernigan, R.L. (2002) A new class of molecular shape descriptors. Theory and properties. J. Chem. Inf. Comput. Sci. 42, 259–273.PubMedCrossRefGoogle Scholar
  65. 65.
    Grant, J.A., Gallardo, G.A., and Pickup, J.T. (1996) A fast method of molecular shape comparison. A simple application of a Gaussian description of molecular shape. J. Comp. Chem. 17, 1653–1666.CrossRefGoogle Scholar
  66. 66.
    Blinn, J.R., Rohrer, D.C., and Maggiora, G.M. (1998) Field-based similarity forcing in energy minimization and molecular matching. In Pacific Symposium on Biocomputing ’99, R.B. Altman, et al., Eds., World Scientific, Singapore, pp. 415–424.Google Scholar
  67. 67.
    Labute, P. (1999) Flexible alignment of small molecules. J. Chem. Comput. Group, Spring 1999 Edition [].
  68. 68.
    Christoffersen, R.E. and Maggiora, G.M. (1969) Ab initio calculations on large molecules using molecular fragments. Preliminary investigations. Chem. Phys. Letts. 3, 419–423.CrossRefGoogle Scholar
  69. 69.
    Szabo, A. and Ostlund, N.S. (1982) Modern Quantum ChemistryIntroduction to Advanced Electronic Structure Theory. Macmillan Publishing Company, New York.Google Scholar
  70. 70.
    Kearsley, S.K. and Smith, G.M. (1990) An alternative method for the alignment of molecular structures: Maximizing electrostatic and steric overlap. Tetrahedron Comput. Meth. 3, 615–633.CrossRefGoogle Scholar
  71. 71.
    Lemmen, C., Hiller, C., and Lengauer, T. (1998) RigFit: A new approach to superimposing ligand molecules. J. Comput.-Aided Mol. Design 12, 491–502.CrossRefGoogle Scholar
  72. 72.
    Good, A.C., Hodgkin, E.E., and Richards, W.G. (1992) Utilization of Gaussian functions for the rapid evaluation of molecular similarity. J. Chem. Inf. Comput. Sci. 32, 188–191.CrossRefGoogle Scholar
  73. 73.
    Carbó, R. and Calabuig, B. (1990) Molecular similarity and quantum chemistry. In Concepts and Applications of Molecular Similarity, M.A. Johnson and G.M. Maggiora, Eds.,Wiley-Interscience, New York, pp. 147–171.Google Scholar
  74. 74.
    Petitjean, M. (1995) Geometric molecular similarity from volume based distance minimization: Application to Saxitoxin and Tetrodotoxin. J. Comput. Chem. 16, 80–90.CrossRefGoogle Scholar
  75. 75.
    Petitjean, M. (1996) Three-dimensional pattern recognition from molecular distance minimization. J. Chem. Inf. Comput. Sci. 36, 1038–1049.CrossRefGoogle Scholar
  76. 76.
    Ballester, P.J. and Richards, W.G. (2007) Ultrafast shape recognition for similarity search in molecular databases. Proc. Roy. Soc. A463, 1307–1321.Google Scholar
  77. 77.
    Nissink, J.W.M., Verdonk, M.L., Kroon, J., Mietzner, T., and Klebe, G. (1997) Superposition of molecules: Electron density fitting by application of Fourier transforms. J. Comput. Chem. 18, 638–645.CrossRefGoogle Scholar
  78. 78.
    Keseru, G.M. and Kolossvary, I. (1999) Molecular Mechanics and Conformational Analysis in Drug Design. Wiley-Interscience (Blackwell Publishing), New York.Google Scholar
  79. 79.
    Jorgensen, W.L. and Tirado-Rives, J. (2005) Potential energy functions for atomic-level simulations of water and organic and biomolecular systems. Proc. Natl. Acad. Sci. U.S.A. 102, 6665–6670.PubMedCrossRefGoogle Scholar
  80. 80.
    Lee, M.S., Salsbury, F.R., and Olson, M.A. (2004). An efficient hybrid explicit/implicit solvent method for biomolecular simulations. J. Comput. Chem. 25, 1967–1978.PubMedCrossRefGoogle Scholar
  81. 81.
    Chipot, C. and Pohorille, A., Eds. (2007) Free Energy Calculations. Theory and Applications in Chemistry and Biology. Springer, New York.Google Scholar
  82. 82.
    Petit, J., Meurice, N. and Maggiora, G.M. (2009) On the development of a “soft” Rule of Five. J. Chem. Inf. Model., submitted.Google Scholar
  83. 83.
    Stephens, M. A. (1974) EDF Statistics for goodness of fit and some comparisons. J. Am. Stat. Assoc. 69, 730–737.CrossRefGoogle Scholar
  84. 84.
    Krishnan, V. (2006) Probability and Random Processes. Wiley-Interscience, Hoboken, New Jersey.CrossRefGoogle Scholar
  85. 85.
    Martin, Y.C. (2001) Diverse viewpoints on computational aspects of molecular diversity. J. Comb. Chem. 3, 231–250.PubMedCrossRefGoogle Scholar
  86. 86.
    Seilo, G. (1998) Similarity measures: Is it possible to compare dissimilar structures? J. Chem. Inf. Comput. Sci. 38, 691–701.CrossRefGoogle Scholar
  87. 87.
    Medina-Franco, J.L., Martínez-Mayorga, K., Giulianotti. M.A., Houghten, R.A., and Pinilla, C. (2008) Visualization of chemical space in drug discovery. Curr. Comput.-Aided Drug Design 4 , 322–333. CrossRefGoogle Scholar
  88. 88.
    Oprea, T.I. and Gottfries, J. (2001) Chemography: The art of navigating in chemical space. J. Comb. Chem., 3, 157–166.PubMedCrossRefGoogle Scholar
  89. 89.
    Wishart, D.S.; Knox, C.; Guo, A.C.; Shrivastava, S.; Hassanali, M.; Stothard, P.; Chang, Z.; and Woolsey, J. DrugBank: A comprehensive resource for in silico drug discovery and exploration, Nucl. Acids Res. 2006, 34, D668-D672. ( Accessed July 6, 2009)
  90. 90.
    Austin, C.P., Brady, L.S., Insel, T.R., and Collins, F.S. (2004) Molecular biology: NIH Molecular libraries initiative. Science 306, 1138–1139. This library is freely accessible by querying ‘MLSMR’in PubChem ( Accessed October 29, 2009)Google Scholar
  91. 91.
    Patterson, D.E., Cramer, R.D., Ferguson, A.M., Clark, R.D., and Weinberger, L.E. (1996) Neighborhood behavior: A useful concept for validation of molecular diversity. J. Med. Chem. 39, 3049–3059.PubMedCrossRefGoogle Scholar
  92. 92.
    Bellman, R.E. (1961) Adaptive Control Processes. Princeton University Press, Princeton, New Jersey.Google Scholar
  93. 93.
    Hastie, T., Tibshirani, R., and Friedman, J. (2001) The Elements of Statistical Learning. Springer, New York.Google Scholar
  94. 94.
    Bishop, C. (1995) Neural Networks for Pattern Recognition. Clarendon Press, Oxford.Google Scholar
  95. 95.
    Raghavendra, A.S. and Maggiora, G.M. (2007) Molecular basis sets – A general similarity-based approach for representing chemical spaces. J. Chem. Info. Model. 47, 1328–1340.CrossRefGoogle Scholar
  96. 96.
    Simovici, D.A. and Djeraba, C. (2008) Mathematical Tools for Data Mining: Set Theory, Partial Orders, Combinatorics. Springer, London, UK.Google Scholar
  97. 97.
    Lee, J.A. and Verleysen, M. (2007) Nonlinear Dimensionality Reduction. Springer, New York.CrossRefGoogle Scholar
  98. 98.
    Walker, P.D., Maggiora, G.M., Johnson, M.A., Petke, J.D., and Mezey, P.G. (1995) Shape group-analysis of molecular similarity - Shape similarity of 6-membered aromatic ring-systems. J. Chem. Inf. Comput. Sci. 35, 568–578.CrossRefGoogle Scholar
  99. 99.
    Rarey, M. and Dixon, J.S. (1998) Feature trees: A new molecular similarity measure based on tree matching. J. Comput.-Aided Mol. Design 12, 471–490.CrossRefGoogle Scholar
  100. 100.
    Agrafiotis, D.K. and Lobanov, V.S. (2000) Nonlinear mapping networks. J. Chem. Inf. Comput. Sci. 40, 1356–1362.PubMedCrossRefGoogle Scholar
  101. 101.
    Rassokhin, D., Lobanov, V.S. and Agrafiotis, D.K. (2000) Nonlinear mapping of massive data sets by fuzzy clustering and neural networks. J. Comput. Chem. 21, 1–14.CrossRefGoogle Scholar
  102. 102.
    Xie, D., Tropsha, A., and Schlick, T. (2000) An efficient projection protocol for chemical databases: Singular value decomposition combined with truncated-Newton minimization. J. Chem. Inf. Comput. Sci. 40, 167–177.PubMedCrossRefGoogle Scholar
  103. 103.
    Kruskal, J. (1977) The relationship between multidimensional scaling and clustering in Classification and Clustering. J. Van Ryzin, Ed., Academic Press, New York.Google Scholar
  104. 104.
    Gower, J.C. (1966) Some distance properties of latent roots and vector methods used in multivariate analysis. Biometrika 53, 325–338.Google Scholar
  105. 105.
    Diamantaras, K.I. and Kung, S.Y. (1996) Principal component neural networks – Theory and Applications. Wiley, New York.Google Scholar
  106. 106.
    Benigni, R. and Giuliani, A. Analysis of distance matrices for studying data structures and separating classes. Struct.-Act. Relat. 12, 397–401.Google Scholar
  107. 107.
    Gower, J.C. (1971) A general coefficient of similarity and some of its properties. Biometrics 27, 857–74.CrossRefGoogle Scholar
  108. 108.
    Gower, J.C. (1984) Distance matrices and their Euclidean approximation. In Data Analysis and Informatics, III, E. Diday et al., Eds., Elsevier Science Publishers B.V. (North-Holland).Google Scholar
  109. 109.
    Gower, J.C. and Legendre, P. (1986) Metric and Euclidean properties of dissimilarity coefficients. J. Classific. 3, 5–48.CrossRefGoogle Scholar
  110. 110.
    Benigni, R. (1994) EVE, a distance-based approach for discriminating non-linearly separable groups. Quant. Struct.-Act. Relat. 13, 406–411.Google Scholar
  111. 111.
    Tenenbaum, J.B., de Silva, V., and Langford, J.V. (2000) A global geometric framework for non-linear dimensionality reduction. Science 290, 2319–2323.PubMedCrossRefGoogle Scholar
  112. 112.
    Roweis, S.T. and Saul, L.K. (2000) Non-linear dimensionality reduction by local linear embedding. Science 290, 2323–2326.PubMedCrossRefGoogle Scholar
  113. 113.
    Friedman, J. and Tukey, J. (1974) A projection pursuit algorithm for exploratory data analysis. IEEE Trans. Comput. C23, 881–889.CrossRefGoogle Scholar
  114. 114.
    Agrafiotis, D.K. (2003) Stochastic proximity embedding. J. Comput. Chem. 24, 1215–1221.PubMedCrossRefGoogle Scholar
  115. 115.
    Agrafiotis, D.K. and Xu, H. (2003) A geodesic framework for analyzing molecular similarities. J. Chem. Inf. Comput. Sci. 43, 475–484.PubMedCrossRefGoogle Scholar
  116. 116.
    Donoho, D.L. and Grimes, C. (2003) Hessian eigenmaps: Local linear embedding techniques for high-dimensional data. Proc. Natl. Acad. Sci U. S. A. 100, 5591–55.PubMedCrossRefGoogle Scholar
  117. 117.
    Maggiora, G.M., Shanmugasundaram, V., Lajiness, M.S., Doman, T.N., and Schulz, M.W. (2005) A practical strategy for directed compound acquisition. In Chemoinformatics in Drug Discovery, T.I. Oprea, Ed., pp. 317–332.Google Scholar
  118. 118.
    Maggiora, G.M. (2006) On outliers and activity cliffs – Why QSAR often disappoints. J. Chem. Inf. Model. 46, 1535 (Editorial).Google Scholar
  119. 119.
    Doweyko, A.M. (2008) QSAR: dead or alive? J. Comput.-Aided Mol. Design 22, 81–89.CrossRefGoogle Scholar
  120. 120.
    Johnson, S. (2008) The trouble with QSAR (or how I learned to stop worrying and embrace fallacy). J. Chem. Inf. Model. 48, 25–26.PubMedCrossRefGoogle Scholar
  121. 121.
    Guha, R. and Van Drie, J.H. (2008) Assessing how well a modeling protocol capture a structure-activity landscape. J. Chem. Inf. Model. 48, 1716–1728.PubMedCrossRefGoogle Scholar
  122. 122.
    Bajorath, J., Peltason, L., Wawer, M., Guha, R., Lajiness, M.S., and Van Drie, J.H. (2009) Navigating structure-activity landscapes. Drug Disc. Today 14, 698–705.CrossRefGoogle Scholar
  123. 123.
    Shanmugasundaram, V. and Maggiora, G.M. (2001) Characterizing property and activity landscapes using an information-theoretic approach. 222 nd American Chemical Society Meeting, Division of Chemical Information Abstract no. 77.Google Scholar
  124. 124.
    Renner, S. and Schneider, G. (2005) Scaffold-hopping potential of ligand-based similarity concepts. Chem. Med. Chem. 1, 181–185.Google Scholar
  125. 125.
    Schneider, G., Schneider, P., and Renner, S. (2006) Scaffold hopping: How far can you jump? QSAR Combin. Sci. 25, 1162–1171.CrossRefGoogle Scholar
  126. 126.
    Maggiora, G.M. and Shanmugasundaram, V. (2005) An information-theoretic characterization of partitioned property spaces. J. Math. Chem. 38, 1–20.CrossRefGoogle Scholar
  127. 127.
    Medina-Franco, J.L., Maggiora, G.M., Giulianotti, M.A., Pinilla, C., and Houghten, R.A. (2007) A similarity-based data-fusion approach to the visual characterization and comparison of compound databases. Chem. Biol. Drug Design 70, 393–412.CrossRefGoogle Scholar
  128. 128.
    Guha, R. and Van Drie, J.H. (2008) Structure-activity landscape index: Identifying and quantifying activity cliffs. J. Chem. Inf. Model. 48, 646–658.PubMedCrossRefGoogle Scholar
  129. 129.
    Peltason, L. and Bajorath, J. (2007) SAR index: Quantifying the nature of structure-activity relationships. J. Med. Chem. 50, 5571–5578.PubMedCrossRefGoogle Scholar
  130. 130.
    Wawer, M., Peltason, L., Weskamp, N., Teckentrup, A., and Bajorath, J. (2008) Structure-activity relationship anatomy by network-like similarity graphs and local structure-activity relationship indices. J. Med. Chem. 51, 6075–6084.PubMedCrossRefGoogle Scholar
  131. 131.
    Medina-Franco, J.L., Martínez-Mayorga, K., Bender, A., Marín, R.M., Giulianotti, M.A., Pinilla, C., and Houghten, R.A. (2009) Characterization of activity landscapes using 2D and 3D similarity methods: Consensus activity cliffs. J. Chem. Inf. Model. 49, 477–491.PubMedCrossRefGoogle Scholar
  132. 132.
    Christoffersen, R.E. (1989) Basic Principles and Techniques of Molecular Quantum Mechanics. Springer, New York.CrossRefGoogle Scholar
  133. 133.
    Schölkopf, B. and Smola, A. (2002) Learning with Kernels. MIT Press, Cambridge, MA.Google Scholar
  134. 134.
    Herbrich, R. (2002) Learning Kernel Classifiers. MIT Press, Cambridge, MA.Google Scholar
  135. 135.
    Shawe-Taylor, J. and Cristianini, N. (2004) Kernel Methods for Pattern Analysis. Cambridge University Press, Cambridge, UK.CrossRefGoogle Scholar
  136. 136.
    Löwdin, P.O. (1992) On linear algebra, the least square method, and the search for linear relations by regression analysis in quantum chemistry and other sciences. Adv. Quantum Chem. 23, 83–126.CrossRefGoogle Scholar
  137. 137.
    Meyer, C.D. (2000) Matrix Analysis and Applied Linear Algebra. Society for Industrial and Applied Mathematics, Philadelphia, Pennsylvania.CrossRefGoogle Scholar
  138. 138.
    Carlson, B.C. and Keller, J.M. (1957) Orthogonalization procedures and the localization of Wannier functions. Phys. Rev. 105, 102–103.CrossRefGoogle Scholar
  139. 139.
    Agrafiotis, D.K., Rassokhin, D.N., and Lobanov, V.S. (2001) Multi-dimensional scaling and visualization of large molecular similarity tables. J. Comput. Chem. 22, 1–13.Google Scholar
  140. 140.
    Kauvar, L.M., Higgins, D.L., and Villar, H.O., et al. (1995) Predicting ligand binding to proteins by affinity fingerprinting. Chem. Biol. 2, 107–118.PubMedCrossRefGoogle Scholar
  141. 141.
    Randic, M. (1991) Resolution of ambiguities in structure-property studies by use of orthogonalized descriptors. J. Chem. Inf. Comput. Sci. 31, 311–320.CrossRefGoogle Scholar
  142. 142.
    Randic, M. (1991) Correlation of enthalpy of octanes with orthogonal connectivity indices. J. Mol. Struct.(Theochem) 233, 45–59.CrossRefGoogle Scholar
  143. 143.
    Randic, M. (1993) Fitting non-linear regressions by orthogonalized power series. J. Comput. Chem. 14, 363–370.CrossRefGoogle Scholar

Copyright information

© Humana Press 2011

Authors and Affiliations

  • Gerald M. Maggiora
    • 1
  • Veerabahu Shanmugasundaram
    • 2
  1. 1.Department of Pharmacology & Toxicology, College of PharmacyUniversity of ArizonaTucsonUSA
  2. 2.Anti-Bacterials Computational Chemistry, Department of Structural Biology, WorldWide Medicinal ChemistryPfizer Global Research & DevelopmentGrotonUSA

Personalised recommendations