Clustering of Molecules: Influence of the Similarity Measures

Part of the Studies in Classification, Data Analysis, and Knowledge Organization book series (STUDIES CLASS)


In this paper, we present the results of an experimental study to analyze the effect of various similarity (or distance) measures on the clustering quality of a set of molecules. We mainly focused on the clustering approaches able to directly deal with the 2D representation of the molecules (i.e., graphs). In such a context, we found that it seems relevant to use an approach based on asymmetrical measures of similarity. Our experiments are carried out on a dataset coming from the High Throughput Screening HTS domain.


Support Vector Machine High Throughput Screening Graph Kernel Maximum Weight Match Initial Family 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. BEN-HUR, A., HORN, D., SIEGELMANN, H.T. and VAPNIK, V. (2001): Support vector clustering. Journal of Machine Learning Research, vol 2, 125–137.CrossRefGoogle Scholar
  2. BISSON, G. (1992): Learning in FOL with a similarity measure. In: Proceedings of 10th AAAI Conference. San-Jose, 82–87.Google Scholar
  3. BISSON, G. (1995): Why and how to define a similarity measure for object-based representation systems. In: Proceedings of 2nd Int. Conf. on Building and Sharing Very Large-scale Knowledge Bases (KBKS). IOS press, 236–246.Google Scholar
  4. BERKHIN, P. (2002): Survey of Clustering Data Mining Techniques. Tech. rep., Accrue Software, San Jose, CA. Scholar
  5. CANDELLIER, L., TELLIER, I., TORRE, F. and BOUSQUET, O. (2006): Cascade evaluation of clustering algorithms, In: Proceedings of ECML. Berlin, 574–581.Google Scholar
  6. CHEMAXON. Scholar
  7. DHILLON, I.S. and GUAN, Y. (2004): Kernel k-means, spectral clustering and normalized cuts, In: Proceedings of KDD. Seattle, 551–556.Google Scholar
  8. FINLEY, T. and JOACHIMS, T. (2005): Supervised clustering with support vector machines, In: Proceedings of ICML. Bonn, 217–224.Google Scholar
  9. FRÖHLICH, H., WEGNER, J., SIEKER, F. and ZELL, A. (2005): A optimal assignment kernels for attributed molecular graphs, In: Proceedings of ICML. Bonn, 225–232.Google Scholar
  10. GARTNER, T., FLACH, P. and WROBEL, S. (2003): On graph kernels: hardness results and efficient alternatives. In: Proceedings of 16th Annual Conf. on Computational Learning Theory and 7th Annual Workshop on Kernel Machines. Springer-Verlag, Berlin, 129–143.Google Scholar
  11. HELMA, C., KRAMER, S. and De RAEDT, L. (2003): The molecular feature miner MolFea. In: Proceedings of the Beilstein Workshop. Bozen.Google Scholar
  12. JARVIS, R.A. and PATRICK, E. A. (1973): Clustering using a similarity measure based on shared near neighbors. In: IEEE Transactions on Computers. C22: 1025–1034.CrossRefGoogle Scholar
  13. KASHIMA, H., KOJI, T. and AKIHIRO, I. (2003): Marginalized kernels between labeled graphs, In: Proceedings of ICML. Washington, DC, 321–328.Google Scholar
  14. MAHE, P., UEDA, N., AKUTSU, T. and VERT, J.-P. (2004): Extensions of marginalized graph kernels, In: Proceedings of the Twenty-First International Conference on Machine Learning (ICML). ACM Press, 552–559.Google Scholar
  15. MAHE, P., UEDA, N., AKUTSU, T., PERRET, J.-L. and VERT, J.-P. (2005): Graph kernels for molecular structure-activity relationship with support vector machines. J. Chem. Inf. Model. 45(4), 939–951.CrossRefGoogle Scholar
  16. RALAIVOLA, L., SWAMIDASS, S.J., SAIGO, H. and BALDI, P. (2005): Graph kernels for chemical informatics. Neural Networks, Special Issue on Neural Networks and Kernel Methods for Structured Domains, 18:8, 1093–1110 Google Scholar
  17. SUTHERLAND, J.J., O’BRIEN, L. A. and WEAVER, D. F. (2003): Spline-fitting with a genetic algorithm: a method for developing classification structure-activity relationships. J. Chem. Inf. Comput. Sci. 43, 1906–1915 CrossRefGoogle Scholar
  18. WEININGER, D. (1988): SMILES 1. Introduction and encoding rules. J. Chem. Inf. Comput. Sci., 28, 31. See also: Scholar
  19. WIECZOREK, S., BISSON, G. and GORDON, MB. (2006): Guiding the search in the NO region of the phase transition problem with a partial subsumption test. In: Proceedings of ECML 2006. LNCS 4212, Berlin, 817–824.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2007

Authors and Affiliations

  1. 1.Centre de Criblage pour Molécules BioactivesGrenoble Cedex 9France
  2. 2.Laboratoire TIMC-IMAG, CNRS / UJF 5525La TroncheFrance
  3. 3.Laboratoire Biologie, Informatique, MathématiquesCEA-DSV-iRTSVGrenoble Cedex 9France

Personalised recommendations