Advertisement

Knowledge and Information Systems

, Volume 28, Issue 1, pp 25–45 | Cite as

Generalized sparse metric learning with relative comparisons

  • Kaizhu Huang
  • Yiming Ying
  • Colin Campbell
Regular Paper

Abstract

The objective of sparse metric learning is to learn a distance measure from a set of data in addition to finding a low-dimensional representation. Despite demonstrated success, the performance of existing sparse metric learning approaches is usually limited because the methods assumes certain problem relaxations or they target the SML objective indirectly. In this paper, we propose a Generalized Sparse Metric Learning method. This novel framework offers a unified view for understanding many existing sparse metric learning algorithms including the Sparse Metric Learning framework proposed in (Rosales and Fung ACM International conference on knowledge discovery and data mining (KDD), pp 367–373, 2006), the Large Margin Nearest Neighbor (Weinberger et al. in Advances in neural information processing systems (NIPS), 2006; Weinberger and Saul in Proceedings of the twenty-fifth international conference on machine learning (ICML-2008), 2008), and the D-ranking Vector Machine (D-ranking VM) (Ouyang and Gray in Proceedings of the twenty-fifth international conference on machine learning (ICML-2008), 2008). Moreover, GSML also establishes a close relationship with the Pairwise Support Vector Machine (Vert et al. in BMC Bioinform, 8, 2007). Furthermore, the proposed framework is capable of extending many current non-sparse metric learning models to their sparse versions including Relevant Component Analysis (Bar-Hillel et al. in J Mach Learn Res, 6:937–965, 2005) and a state-of-the-art method proposed in (Xing et al. Advances in neural information processing systems (NIPS), 2002). We present the detailed framework, provide theoretical justifications, build various connections with other models, and propose an iterative optimization method, making the framework both theoretically important and practically scalable for medium or large datasets. Experimental results show that this generalized framework outperforms six state-of-the-art methods with higher accuracy and significantly smaller dimensionality for seven publicly available datasets.

Keywords

Generalized framework Metric learning Sparse 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Agarwal S, Wills J, Gayton L, Lanckriet G, Kriegman D, Belongie S (2008) Generalized non-metric multidimensional scaling. In: International conference on artificial intelligence and statistics (AISTAT’08)Google Scholar
  2. 2.
    Argyriou A, Evgeniou T, Pontil M (2006) Multi-task feature learning. In: Advances in neural information processing systems (NIPS) 18Google Scholar
  3. 3.
    Asuncion A, Newman D (2007) UCI machine-learning repository. In: http://www.ics.uci.edu/~mlearn/MLRepository.html
  4. 4.
    Athitsos V, Alton J, Sclaroff S, Kollios G (2004) Boostmap: a method for efficient approximate similarity rankings. In: Proceedings of IEEE computer society conference on computer vision and pattern recognition (CVPR)Google Scholar
  5. 5.
    Bar-Hillel A, Hertz T, Shental N, Weinshall D (2005) Learning a mahalanobis metric from equivalence constraints. J Mach Learn Res 6: 937–965MathSciNetGoogle Scholar
  6. 6.
    Chopra S, Hadsell R, LeCun Y (2005) Learning a similarity metric discriminatively, with application to face verification. In: Proceedings of 1998 IEEE computer society conference on computer vision and pattern recognition (CVPR-2005)Google Scholar
  7. 7.
    Cover TM, Hart PE (1967) Nearest neighbor pattern classification. IEEE Trans Inf Theory IT-13(1): 21–27CrossRefGoogle Scholar
  8. 8.
    Cox T, Cox M (1994) Multidimensional scaling. Chapman & Hall, LondonMATHGoogle Scholar
  9. 9.
    Davis J, Kulis B, Jain P, Sra S, Dhillon I (2007) Information-theoretic metric learning. In: International conference on machine learning (ICML)Google Scholar
  10. 10.
    Fung G, Mangasarian OL, Smola AJ (2002) Minimal kernel classifiers. J Mach Learn Res 3: 303–321CrossRefMathSciNetGoogle Scholar
  11. 11.
    Fung G, Rosales R, Rao RB (2007) Feature selection and kernel design via linear programming. In: Proceedings of internet joint conference on artificial intelligence (IJCAI), pp 786–791Google Scholar
  12. 12.
    Goldberger J, Roweis S, Hinton G, Salakhutdinov R (2004) Neighbourhood component analysis. In: Advances in neural information processing systems (NIPS)Google Scholar
  13. 13.
    Hastie T, Tibshirani R, Friedman R (2003) The Elements of statistical learning. Springer, New YorkGoogle Scholar
  14. 14.
    Huang K, Yang H, King I, Lyu MR (2004) Learning classifiers from imbalanced data based on biased minimax probability machine. In: Proceedings of 2004 IEEE computer society conference on computer vision and pattern recognition (CVPR-2004), vol 2. pp 558–563Google Scholar
  15. 15.
    Huang K, Yang H, King I, Lyu MR (2008) Maximin margin machine: learning large margin classifiers globally and locally. IEEE Trans on Neural Netw 19: 260–272CrossRefGoogle Scholar
  16. 16.
    Huang K, Yang H, King I, Lyu MR, Chan L (2004) The minimum error minimax probability machine. J Mach Learn Res 5: 1253–1286MathSciNetGoogle Scholar
  17. 17.
    Jolliffe IT (1989) Principal component analysis. Springer, New YorkGoogle Scholar
  18. 18.
    Micchelli CA, Pontil M (2005) Learning the kernel function via regularization. J Mach Learn Res 6: 1099–1125MathSciNetGoogle Scholar
  19. 19.
    Nesterov Y (2003) Introductory lectures on convex optimization: a basic course. Springer, New YorkGoogle Scholar
  20. 20.
    Nesterov Y (2005) Smooth minimization of non-smooth functions. Math Program 152: 103–127Google Scholar
  21. 21.
    Ouyang H, Gray A (2008) Learning dissimilarities by ranking: from sdp to qp. In: Proceedings of the twenty-five international conference on machine learning (ICML-2008)Google Scholar
  22. 22.
    Pfitzner D, Leibbrandt R, Powers D (2009) Characterization and evaluation of similarity measures for pairs of clusterings. Knowl Inf Syst 19: 361–394CrossRefGoogle Scholar
  23. 23.
    Quan X, Liu G, Lu Z, Ni X, Liu W (2010) Short text similarity based on probabilistic topics. Knowl Inf SystGoogle Scholar
  24. 24.
    Rosales R, Fung G (2006) Learning sparse metrics via linear programming. In: ACM international conference on knowledge discovery and data mining (KDD), pp 367–373Google Scholar
  25. 25.
    Roweis ST, Saul LK (2000) Nonlinear dimensionality reduction by locally linear embedding. Science (290):123–137Google Scholar
  26. 26.
    Schölkopf B, Smola A (2002) Learning with kernels. MIT Press, CambridgeGoogle Scholar
  27. 27.
    Schultz M, Joachims T (2003) Learning a distance metric from relative comparisons. In: Advances in neural information processing systems (NIPS)Google Scholar
  28. 28.
    Song G, Cui B, Zheng B, Xie K, Yang D (2009) Accelerating sequence searching: dimensionality reduction method. Knowl Inf Syst 20: 301–322CrossRefGoogle Scholar
  29. 29.
    Song L, Smola A, Borgwardt K, Gretton A (2008) Colored maximum variance unfolding. In: Advances in neural information processing systems (NIPS)Google Scholar
  30. 30.
    Sugiyama M (2007) Dimensionality reduction of multimodal labeled data by local fisher discriminant analysis. J Mach Learn Res 18: 1027–1061Google Scholar
  31. 31.
    Torresani L, Lee K (2007) Large margin component analysis. In: Advances in neural information processing systems (NIPS)Google Scholar
  32. 32.
    Vert J-P, Qiu J, Nobel WS (2007) A new pairwise kernel for biological network inference with support vector machines. BMC Bioinform 8Google Scholar
  33. 33.
    Wagstaff K, Cardie C, Rogers S, Schroedl S (2001) Constrained k-means clustering with background knowledge. In: International conference on machine learning (ICML)Google Scholar
  34. 34.
    Weinberger K, Blitzer J, Saul L (2006) Distance metric learning for large margin nearest neighbor classification. In: Advances in neural information processing systems (NIPS)Google Scholar
  35. 35.
    Weinberger K, Saul L (2008) Fast solvers and efficient implementations for distance metric learning. In: Proceedings of the twenty-fifth international conference on machine learning (ICML-2008)Google Scholar
  36. 36.
    Xing E, Ng A, Jordan M, Russell S (2002) Distance metric learning, with application to clustering with side information. In: Advances in neural information processing systems (NIPS)Google Scholar
  37. 37.
    Yang L, Jin R (2006) Distance metric learning: a comprehensive survey. In: Technical report, department of computer science and engineering. Michigan state universityGoogle Scholar

Copyright information

© Springer-Verlag London Limited 2010

Authors and Affiliations

  1. 1.National Laboratory of Pattern RecognitionInstitute of Automation, Chinese Academy of SciencesBeijingChina
  2. 2.School of Engineering, Computing and MathematicsUniversity of ExeterExeterUK
  3. 3.Department of Engineering MathematicsUniversity of BristolBristolUK

Personalised recommendations