Skip to main content

Formulating Distance Functions

  • Chapter
  • First Online:
  • 997 Accesses

Abstract

Tasks of data mining and information retrieval depend on a good distance function for measuring similarity between data instances. The most effective distance function must be formulated in a context-dependent (also application-, data- and user-dependent) way. In this chapter,\(^\dagger\) we present a method, which learns a distance function by capturing the nonlinear relationships among contextual information provided by the application, data, or user. We show that through a process called the “kernel trick,” such nonlinear relationships can be learned efficiently in a projected space. Theoretically, we substantiate that our method is both sound and optimal. Empirically, using several datasets and applications, we demonstrate that our method is effective and useful.

© ACM, 2005. This chapter is a minor revision of the author’s work with Gang Wu and Navneet Panda [1] published in KDD’05. Permission to publish this chapter is granted under copyright license #2587641486368.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD   109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    The kernel trick was first published in 1964 in the paper of Aizerman et al. [7]. The kernel trick has been applied to several algorithms in statistics, including Support Vector Machines and kernel PCA.

  2. 2.

    In this chapter, we only consider the case where \({\fancyscript{S}}\;\hbox{and}\;{\fancyscript{D}}\) are obtained from the class-label information. How to construct \({\fancyscript{S}}\;\hbox{and}\;{\fancyscript{D}}\) has been explained in the beginning of Sect. 5.2.

  3. 3.

    Xing’s algorithm cannot be run when the dimensionality of \({\fancyscript{I}}\) is very high or when nonlinear kernels, such as Gaussian and Laplacian, are employed. This is because its computational time does not scale well with the high dimensionality of input space and non-linear kernels [10]. We thus did not report the corresponding results in these cases.

  4. 4.

    Contextual information is also called side information in some papers such as [16, 18].

  5. 5.

    Given a kernel function \(K\) and a set of instances \({\fancyscript{X}},\) the kernel matrix (Gram matrix) is the matrix of inner-products of all possible pairs from \({\fancyscript{X}}\times{X},\;{\mathbf{K}}=[k_{ij}],\) where \(k_{ij}=K({\mathbf{x}}_i,{\mathbf{x}}_j)\).

  6. 6.

    The kernel trick in (5.1) uniquely links the kernel functions with the distance function. The former \((K)\) provides the pairwise-similarity measurement between two instances, whereas the latter \((d)\) provides the pairwise-dissimilarity measurement between two instances. Therefore, when we say a transformation on a prior kernel function, it also means a transformation on a prior distance function, vise versa.

References

  1. G. Wu, E.Y. Chang, N. Panda, Formulating distance functions via the kernel trick, in Proceedings of ACM SIGKDD, 2005, pp. 703–709

    Google Scholar 

  2. C.C. Aggarwal, Towards systematic design of distance functions for data mining applications, in Proceedings of ACM SIGKDD, 2003, pp. 9–18

    Google Scholar 

  3. R. Fagin, R. Kumar, D. Sivakumar, Efficient similarity search and classification via rank aggregation. In Proceedings of ACM SIGMOD Conference on Management of Data, pp. 301–312, June 2003.

    Google Scholar 

  4. T. Wang, Y. Rui, S.M. Hu, J.Q. Sun, Adaptive tree similarity learning for image retrieval. Multimedia Syst. 9(2), 131–143 (2003)

    Article  Google Scholar 

  5. Y. Rui, T. Huang, Optimizing learning in image retrieval, in Proceedings of IEEE CVPR, June 2000, pp. 236–245

    Google Scholar 

  6. S. Tong, E. Chang, Support vector machine active learning for image retrieval, in Proceedings of ACM International Conference on Multimedia, 2001, pp. 107–118

    Google Scholar 

  7. M.A. Aizerman, E.M. Braverman, L.I. Rozonoer, Theoretical foundations of the potential function method in pattern recognition learning. Autom Remote Control 25, 821–837 (1964)

    MathSciNet  Google Scholar 

  8. N. Cristianini, J. Shawe-Taylor, A. Elisseeff, J. Kandola, On kernel target alignment, NIPS, 2001, pp. 367–373

    Google Scholar 

  9. B. Schölkopf, A. Smola, Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond. (MIT Press, Cambridge, 2002)

    Google Scholar 

  10. J.T. Kwok, I.W. Tsang, Learning with idealized kernels, in Proceedings of the Twentieth International Conference on Machine Learning, Washington DC, August 2003, pp. 400–407

    Google Scholar 

  11. Y. Grandvalet, S. Canu, Adaptive scaling for feature selection in SVMs, in Proceedings of NIPS, 2002, pp. 553–560

    Google Scholar 

  12. E. Amaldi, V. Kann, On the approximability of minimizing non-zero variables or unsatisfied relations in linear systems. Theor. Comput. Sci. 209, 237–260 (1998)

    Article  MathSciNet  MATH  Google Scholar 

  13. M. Fazel, Matrix rank minimization with applications, Ph.D. Thesis, Electrical Engineering Dept, Stanford University, March 2002

    Google Scholar 

  14. V. Vapnik, Statistical Learning Theory (Wiley, New York, 1998)

    Google Scholar 

  15. H.W. Kuhn, A.W. Tucker, Nonlinear programming, in Proceedings of the Second Berkeley Symposium on Mathematical Statistics and Probability, University of California Press, Berkeley, 1951, pp. 481–492

    Google Scholar 

  16. E. Xing, A. Ng, M. Jordan, S. Russell, Distance metric learning, with application to clustering with side-information, in Proceedings of NIPS, 2002, pp. 505–512

    Google Scholar 

  17. D. Wettschereck, D. Aha, T. Mohri, A review and empirical evaluation of feature weighting methods for a class of lazy learning algorithms. Artif. Intell. Rev. 11, 273–314 (1997)

    Article  Google Scholar 

  18. A. Bar-hillel, T. Hertz, N. Shental, D. Weinshall, Learning distance functions using equivalence relations, in Proceedings of International Conference on Machine Learning (ICML), Washington, DC, August 2003, pp. 11–18

    Google Scholar 

  19. M. Nadler, E.P. Smith, Pattern Recognition Engineering. (Wiley, New York, 1993)

    MATH  Google Scholar 

  20. A. Ben-Hur, D. Horn, H.T. Siegelmann, V. Vapnik, Support vector clustering. J. Mach. Learn. Res. 2, 125–137 (2002)

    MATH  Google Scholar 

  21. Z. Zhang, Learning metrics via discriminant kernels and multidimensional scaling: towards expected euclidean representation, in Proceedings of International Conference on Machine Learning (ICML), August 2003, pp. 872–879

    Google Scholar 

  22. A.R. Webb, Multidimensional scaling by iterative majorization using radial basis functions. Pattern Recognit. 28(5), 753–759 (1995)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Edward Y. Chang .

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg and Tsinghua University Pres

About this chapter

Cite this chapter

Chang, E.Y. (2011). Formulating Distance Functions. In: Foundations of Large-Scale Multimedia Information Management and Retrieval. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-20429-6_5

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-20429-6_5

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-20428-9

  • Online ISBN: 978-3-642-20429-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics