Clustering of Molecules: Influence of the Similarity Measures
Chapter
- 1 Citations
- 1.7k Downloads
Abstract
In this paper, we present the results of an experimental study to analyze the effect of various similarity (or distance) measures on the clustering quality of a set of molecules. We mainly focused on the clustering approaches able to directly deal with the 2D representation of the molecules (i.e., graphs). In such a context, we found that it seems relevant to use an approach based on asymmetrical measures of similarity. Our experiments are carried out on a dataset coming from the High Throughput Screening HTS domain.
Keywords
Support Vector Machine High Throughput Screening Graph Kernel Maximum Weight Match Initial Family
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
Preview
Unable to display preview. Download preview PDF.
References
- BEN-HUR, A., HORN, D., SIEGELMANN, H.T. and VAPNIK, V. (2001): Support vector clustering. Journal of Machine Learning Research, vol 2, 125–137.CrossRefGoogle Scholar
- BISSON, G. (1992): Learning in FOL with a similarity measure. In: Proceedings of 10th AAAI Conference. San-Jose, 82–87.Google Scholar
- BISSON, G. (1995): Why and how to define a similarity measure for object-based representation systems. In: Proceedings of 2nd Int. Conf. on Building and Sharing Very Large-scale Knowledge Bases (KBKS). IOS press, 236–246.Google Scholar
- BERKHIN, P. (2002): Survey of Clustering Data Mining Techniques. Tech. rep., Accrue Software, San Jose, CA. http://citeseer.nj.nec.com/berkhin02survey.html.Google Scholar
- CANDELLIER, L., TELLIER, I., TORRE, F. and BOUSQUET, O. (2006): Cascade evaluation of clustering algorithms, In: Proceedings of ECML. Berlin, 574–581.Google Scholar
- CHEMAXON. http://www.chemaxon.com/Google Scholar
- DHILLON, I.S. and GUAN, Y. (2004): Kernel k-means, spectral clustering and normalized cuts, In: Proceedings of KDD. Seattle, 551–556.Google Scholar
- FINLEY, T. and JOACHIMS, T. (2005): Supervised clustering with support vector machines, In: Proceedings of ICML. Bonn, 217–224.Google Scholar
- FRÖHLICH, H., WEGNER, J., SIEKER, F. and ZELL, A. (2005): A optimal assignment kernels for attributed molecular graphs, In: Proceedings of ICML. Bonn, 225–232.Google Scholar
- GARTNER, T., FLACH, P. and WROBEL, S. (2003): On graph kernels: hardness results and efficient alternatives. In: Proceedings of 16th Annual Conf. on Computational Learning Theory and 7th Annual Workshop on Kernel Machines. Springer-Verlag, Berlin, 129–143.Google Scholar
- HELMA, C., KRAMER, S. and De RAEDT, L. (2003): The molecular feature miner MolFea. In: Proceedings of the Beilstein Workshop. Bozen.Google Scholar
- JARVIS, R.A. and PATRICK, E. A. (1973): Clustering using a similarity measure based on shared near neighbors. In: IEEE Transactions on Computers. C22: 1025–1034.CrossRefGoogle Scholar
- KASHIMA, H., KOJI, T. and AKIHIRO, I. (2003): Marginalized kernels between labeled graphs, In: Proceedings of ICML. Washington, DC, 321–328.Google Scholar
- MAHE, P., UEDA, N., AKUTSU, T. and VERT, J.-P. (2004): Extensions of marginalized graph kernels, In: Proceedings of the Twenty-First International Conference on Machine Learning (ICML). ACM Press, 552–559.Google Scholar
- MAHE, P., UEDA, N., AKUTSU, T., PERRET, J.-L. and VERT, J.-P. (2005): Graph kernels for molecular structure-activity relationship with support vector machines. J. Chem. Inf. Model. 45(4), 939–951.CrossRefGoogle Scholar
- RALAIVOLA, L., SWAMIDASS, S.J., SAIGO, H. and BALDI, P. (2005): Graph kernels for chemical informatics. Neural Networks, Special Issue on Neural Networks and Kernel Methods for Structured Domains, 18:8, 1093–1110 Google Scholar
- SUTHERLAND, J.J., O’BRIEN, L. A. and WEAVER, D. F. (2003): Spline-fitting with a genetic algorithm: a method for developing classification structure-activity relationships. J. Chem. Inf. Comput. Sci. 43, 1906–1915 CrossRefGoogle Scholar
- WEININGER, D. (1988): SMILES 1. Introduction and encoding rules. J. Chem. Inf. Comput. Sci., 28, 31. See also: http://www.daylight.com/.CrossRefGoogle Scholar
- WIECZOREK, S., BISSON, G. and GORDON, MB. (2006): Guiding the search in the NO region of the phase transition problem with a partial subsumption test. In: Proceedings of ECML 2006. LNCS 4212, Berlin, 817–824.Google Scholar
Copyright information
© Springer-Verlag Berlin Heidelberg 2007