Abstract
Tasks of data mining and information retrieval depend on a good distance function for measuring similarity between data instances. The most effective distance function must be formulated in a context-dependent (also application-, data- and user-dependent) way. In this chapter,\(^\dagger\) we present a method, which learns a distance function by capturing the nonlinear relationships among contextual information provided by the application, data, or user. We show that through a process called the “kernel trick,” such nonlinear relationships can be learned efficiently in a projected space. Theoretically, we substantiate that our method is both sound and optimal. Empirically, using several datasets and applications, we demonstrate that our method is effective and useful.
†© ACM, 2005. This chapter is a minor revision of the author’s work with Gang Wu and Navneet Panda [1] published in KDD’05. Permission to publish this chapter is granted under copyright license #2587641486368.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsNotes
- 1.
The kernel trick was first published in 1964 in the paper of Aizerman et al. [7]. The kernel trick has been applied to several algorithms in statistics, including Support Vector Machines and kernel PCA.
- 2.
In this chapter, we only consider the case where \({\fancyscript{S}}\;\hbox{and}\;{\fancyscript{D}}\) are obtained from the class-label information. How to construct \({\fancyscript{S}}\;\hbox{and}\;{\fancyscript{D}}\) has been explained in the beginning of Sect. 5.2.
- 3.
Xing’s algorithm cannot be run when the dimensionality of \({\fancyscript{I}}\) is very high or when nonlinear kernels, such as Gaussian and Laplacian, are employed. This is because its computational time does not scale well with the high dimensionality of input space and non-linear kernels [10]. We thus did not report the corresponding results in these cases.
- 4.
- 5.
Given a kernel function \(K\) and a set of instances \({\fancyscript{X}},\) the kernel matrix (Gram matrix) is the matrix of inner-products of all possible pairs from \({\fancyscript{X}}\times{X},\;{\mathbf{K}}=[k_{ij}],\) where \(k_{ij}=K({\mathbf{x}}_i,{\mathbf{x}}_j)\).
- 6.
The kernel trick in (5.1) uniquely links the kernel functions with the distance function. The former \((K)\) provides the pairwise-similarity measurement between two instances, whereas the latter \((d)\) provides the pairwise-dissimilarity measurement between two instances. Therefore, when we say a transformation on a prior kernel function, it also means a transformation on a prior distance function, vise versa.
References
G. Wu, E.Y. Chang, N. Panda, Formulating distance functions via the kernel trick, in Proceedings of ACM SIGKDD, 2005, pp. 703–709
C.C. Aggarwal, Towards systematic design of distance functions for data mining applications, in Proceedings of ACM SIGKDD, 2003, pp. 9–18
R. Fagin, R. Kumar, D. Sivakumar, Efficient similarity search and classification via rank aggregation. In Proceedings of ACM SIGMOD Conference on Management of Data, pp. 301–312, June 2003.
T. Wang, Y. Rui, S.M. Hu, J.Q. Sun, Adaptive tree similarity learning for image retrieval. Multimedia Syst. 9(2), 131–143 (2003)
Y. Rui, T. Huang, Optimizing learning in image retrieval, in Proceedings of IEEE CVPR, June 2000, pp. 236–245
S. Tong, E. Chang, Support vector machine active learning for image retrieval, in Proceedings of ACM International Conference on Multimedia, 2001, pp. 107–118
M.A. Aizerman, E.M. Braverman, L.I. Rozonoer, Theoretical foundations of the potential function method in pattern recognition learning. Autom Remote Control 25, 821–837 (1964)
N. Cristianini, J. Shawe-Taylor, A. Elisseeff, J. Kandola, On kernel target alignment, NIPS, 2001, pp. 367–373
B. Schölkopf, A. Smola, Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond. (MIT Press, Cambridge, 2002)
J.T. Kwok, I.W. Tsang, Learning with idealized kernels, in Proceedings of the Twentieth International Conference on Machine Learning, Washington DC, August 2003, pp. 400–407
Y. Grandvalet, S. Canu, Adaptive scaling for feature selection in SVMs, in Proceedings of NIPS, 2002, pp. 553–560
E. Amaldi, V. Kann, On the approximability of minimizing non-zero variables or unsatisfied relations in linear systems. Theor. Comput. Sci. 209, 237–260 (1998)
M. Fazel, Matrix rank minimization with applications, Ph.D. Thesis, Electrical Engineering Dept, Stanford University, March 2002
V. Vapnik, Statistical Learning Theory (Wiley, New York, 1998)
H.W. Kuhn, A.W. Tucker, Nonlinear programming, in Proceedings of the Second Berkeley Symposium on Mathematical Statistics and Probability, University of California Press, Berkeley, 1951, pp. 481–492
E. Xing, A. Ng, M. Jordan, S. Russell, Distance metric learning, with application to clustering with side-information, in Proceedings of NIPS, 2002, pp. 505–512
D. Wettschereck, D. Aha, T. Mohri, A review and empirical evaluation of feature weighting methods for a class of lazy learning algorithms. Artif. Intell. Rev. 11, 273–314 (1997)
A. Bar-hillel, T. Hertz, N. Shental, D. Weinshall, Learning distance functions using equivalence relations, in Proceedings of International Conference on Machine Learning (ICML), Washington, DC, August 2003, pp. 11–18
M. Nadler, E.P. Smith, Pattern Recognition Engineering. (Wiley, New York, 1993)
A. Ben-Hur, D. Horn, H.T. Siegelmann, V. Vapnik, Support vector clustering. J. Mach. Learn. Res. 2, 125–137 (2002)
Z. Zhang, Learning metrics via discriminant kernels and multidimensional scaling: towards expected euclidean representation, in Proceedings of International Conference on Machine Learning (ICML), August 2003, pp. 872–879
A.R. Webb, Multidimensional scaling by iterative majorization using radial basis functions. Pattern Recognit. 28(5), 753–759 (1995)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg and Tsinghua University Pres
About this chapter
Cite this chapter
Chang, E.Y. (2011). Formulating Distance Functions. In: Foundations of Large-Scale Multimedia Information Management and Retrieval. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-20429-6_5
Download citation
DOI: https://doi.org/10.1007/978-3-642-20429-6_5
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-20428-9
Online ISBN: 978-3-642-20429-6
eBook Packages: Computer ScienceComputer Science (R0)