Abstract
Measuring similarity between objects is a fundamental issue for numerous applications in data-mining and machine learning domains. In this paper, we are interested in kernels. We particularly focus on kernel normalization methods that aim at designing proximity measures that better fit the definition and the intuition of a similarity index. To this end, we introduce a new family of normalization techniques which extends the cosine normalization. Our approach aims at refining the cosine measure between vectors in the feature space by considering another geometrical based score which is the mapped vectors’ norm ratio. We show that the designed normalized kernels satisfy the basic axioms of a similarity index unlike most unnormalized kernels. Furthermore, we prove that the proposed normalized kernels are also kernels. Finally, we assess these different similarity measures in the context of clustering tasks by using a kernel PCA based clustering approach. Our experiments employing several real-world datasets show the potential benefits of normalized kernels over the cosine normalization and the Gaussian RBF kernel.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Scholkopf, B., Smola, A.J.: Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond. MIT Press, Cambridge (2001)
Vert, J., Tsuda, K., Schlkopf, B.: A primer on kernel methods. In: Schlkopf, B., Vert, J.P., Tsuda, K. (eds.) Kernel Methods in Computational Biology, pp. 35–70. MIT Press, Cambridge (2004)
Schölkopf, B., Smola, A., Müller, K.R.: Nonlinear component analysis as a kernel eigenvalue problem. Neural Comput. 10(5), 1299–1319 (1998)
Ding, C., He, X.: K-means clustering via principal component analysis. In: ICML ’04: Proceedings of the twenty-first international conference on Machine learning, p. 29. ACM, New York (2004)
Asuncion, A., Newman, D.: UCI machine learning repository (2007)
Santini, S., Jain, R.: Similarity measures. IEEE Transactions on Pattern Analysis and Machine Intelligence 21(9), 871–883 (1999)
Horn, R., Johnson, C.: Matrix analysis. Cambridge University Press, Cambridge (1985)
Gower, J., Legendre, P.: Metric and euclidean properties of dissimilarity coefficients. Journal of classification 3, 5–48 (1986)
Filippone, M., Camastra, F., Masulli, F., Rovetta, S.: A survey of kernel and spectral methods for clustering. Pattern Recogn. 41(1), 176–190 (2008)
Tuytelaars, T., Lampert, C.H., Blaschko, M.B., Buntine, W.: Unsupervised object discovery: A comparison. International Journal of Computer Vision Epub ahead, 1–19 (July 2009)
Minier, Z., Csató, L.: Kernel PCA based clustering for inducing features in text categorization. In: ESANN, pp. 349–354 (2007)
Zelnik-Manor, L., Perona, P.: Self-tuning spectral clustering. In: Advances in Neural Information Processing Systems, vol. 17 (2005)
Strehl, A., Strehl, E., Ghosh, J.: Cluster ensembles - a knowledge reuse framework for combining partitionings. Journal of Machine Learning Research 3, 583–617 (2002)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Ah-Pine, J. (2010). Normalized Kernels as Similarity Indices. In: Zaki, M.J., Yu, J.X., Ravindran, B., Pudi, V. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2010. Lecture Notes in Computer Science(), vol 6119. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-13672-6_36
Download citation
DOI: https://doi.org/10.1007/978-3-642-13672-6_36
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-13671-9
Online ISBN: 978-3-642-13672-6
eBook Packages: Computer ScienceComputer Science (R0)