Maximum common substructure-based Tversky index: an asymmetric hybrid similarity measure
Current approaches for the assessment of molecular similarity can generally be divided into descriptor-based and substructure-based methods. The former require the application of similarity metrics that yield continuous similarity values, whereas the readout of the latter is binary (i.e. similar vs. not similar). However, it is also possible to combine descriptor-based and substructure-based methods to exploit advantages of individual methods in context and generate similarity measures for special applications. Herein we present a hybrid measure for asymmetric similarity calculations on the basis of maximum common core structures. This similarity function can be effectively applied to compare small reference compounds with larger test molecules, which is difficult using conventional metrics.
KeywordsMolecular similarity Similarity metrics Substructure methods Maximum common substructure Tversky similarity Hybrid measures
- 5.Maggiora GM, Shanmugasundaram V (2004) Molecular similarity measures. In: Bajorath J (ed) Chemoinformatics—concepts, methods, and tools for drug discovery. Humana Press, Totowa NJGoogle Scholar
- 15.OEChem TK version 2.0.0; OpenEye Scientific Software, Santa Fe, NM. http://www.eyesopen.com
- 18.Cochran WG (1977) Sampling Techniques, 3rd edn. Wiley, New YorkGoogle Scholar