Journal of Computer-Aided Molecular Design

, Volume 30, Issue 7, pp 523–531 | Cite as

Maximum common substructure-based Tversky index: an asymmetric hybrid similarity measure



Current approaches for the assessment of molecular similarity can generally be divided into descriptor-based and substructure-based methods. The former require the application of similarity metrics that yield continuous similarity values, whereas the readout of the latter is binary (i.e. similar vs. not similar). However, it is also possible to combine descriptor-based and substructure-based methods to exploit advantages of individual methods in context and generate similarity measures for special applications. Herein we present a hybrid measure for asymmetric similarity calculations on the basis of maximum common core structures. This similarity function can be effectively applied to compare small reference compounds with larger test molecules, which is difficult using conventional metrics.


Molecular similarity Similarity metrics Substructure methods Maximum common substructure Tversky similarity Hybrid measures 


  1. 1.
    Maggiora GM, Vogt M, Stumpfe D, Bajorath J (2014) Molecular similarity in medicinal chemistry. J Med Chem 57:3186–3204CrossRefGoogle Scholar
  2. 2.
    Willett P (2014) The calculation of molecular structural similarity: principles and practice. Mol Inf 33(6–7):403–413CrossRefGoogle Scholar
  3. 3.
    Vogt M, Stumpfe D, Geppert H, Bajorath J (2010) Scaffold hopping using two-dimensional fingerprints: true potential, black magic, or a hopeless endeavor? Guidelines for virtual screening. J Med Chem 12:5707–5715CrossRefGoogle Scholar
  4. 4.
    Gardiner EJ, Holliday JD, O’Dowd C, Willett P (2011) Effectiveness of 2D fingerprints for scaffold hoping. Future Med Chem 3:405–414CrossRefGoogle Scholar
  5. 5.
    Maggiora GM, Shanmugasundaram V (2004) Molecular similarity measures. In: Bajorath J (ed) Chemoinformatics—concepts, methods, and tools for drug discovery. Humana Press, Totowa NJGoogle Scholar
  6. 6.
    Raymond W, Willett P (2002) Effectiveness of graph-based and fingerprint-based similarity measures for virtual screening of 2D chemical structure databases. J Comput-Aided Mol Des 16:59–71CrossRefGoogle Scholar
  7. 7.
    Kenny PW, Sadowski J (2005) Structure modification in chemical databases. In: Oprea TI (ed) Chemoinformatics in drug discovery. Wiley-VCH, Weinheim, pp 271–285CrossRefGoogle Scholar
  8. 8.
    Hussain J, Rea C (2010) Computationally efficient algorithm to identify matched molecular pairs (MMPs) in large data sets. J Chem Inf Model 50:339–348CrossRefGoogle Scholar
  9. 9.
    Zhang B, Vogt M, Maggiora GM, Bajorath J (2015) Design of chemical space networks using a Tanimoto similarity variant based upon maximum common substructures. J Comput Aided Mol Des 29:937–950CrossRefGoogle Scholar
  10. 10.
    Maggiora GM, Bajorath J (2014) Chemical space networks—a powerful new paradigm for the description of chemical space. J Comput-Aided Mol Des 28:795–802CrossRefGoogle Scholar
  11. 11.
    Tversky A (1977) Features of similarity. Psychol Rev 84:327–352CrossRefGoogle Scholar
  12. 12.
    Horvath D, Marcou G, Varnek A (2013) Do not hesitate to use Tversky—and other hints for successful active analogue searches with feature count descriptors. J Chem Inf Model 53:1543–1562CrossRefGoogle Scholar
  13. 13.
    Duesbury E, Holliday J, Willett P (2015) Maximum common substructure-based data fusion in similarity searching. J Chem Inf Model 55:222–230CrossRefGoogle Scholar
  14. 14.
    Wu M, Vogt M, Maggiora GM, Bajorath J (2016) Design of chemical space networks on the basis of Tversky similarity. J Comput-Aided Mol Des 30:1–12CrossRefGoogle Scholar
  15. 15.
    OEChem TK version 2.0.0; OpenEye Scientific Software, Santa Fe, NM.
  16. 16.
    Rogers D, Hahn M (2010) Extended-connectivity fingerprints. J Chem Inf Model 50:742–754CrossRefGoogle Scholar
  17. 17.
    Gaulton A, Bellis LJ, Bento AP, Chambers J, Davies M, Hersey A, Light Y, McGlinchey S, Michalovich D, Al-Lazikani B, Overington JP (2012) ChEMBL: a large-scale bioactivity database for drug discovery. Nucleic Acids Res 40(Database issue):D1100–D1107CrossRefGoogle Scholar
  18. 18.
    Cochran WG (1977) Sampling Techniques, 3rd edn. Wiley, New YorkGoogle Scholar
  19. 19.
    Wang Y, Eckert H, Bajorath J (2007) Apparent asymmetry in fingerprint similarity searching is a direct consequence of differences in bit densities and molecular size. ChemMedChem 2:1037–1042CrossRefGoogle Scholar
  20. 20.
    Wang Y, Bajorath J (2008) Balancing the influence of molecular complexity on fingerprint similarity searching. J Chem Inf Model 48:75–84CrossRefGoogle Scholar
  21. 21.
    Wang Y, Bajorath J (2010) Advanced fingerprint methods for similarity searching: balancing molecular complexity effects. Comb Chem High-Throughput Screen 13:220–228CrossRefGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2016

Authors and Affiliations

  1. 1.Department of Life Science Informatics, B-IT, LIMES Program Unit Chemical Biology and Medicinal ChemistryRheinische Friedrich-Wilhelms-UniversitätBonnGermany

Personalised recommendations