Abstract
Learning a desired distance metric from given training samples plays a significant role in the field of machine learning. In this chapter, we first present two novel metric learning methods based on a support vector machine (SVM). We then present a kernel classification framework for metric learning that can be implemented efficiently by using the standard SVM solvers. Some novel kernel metric learning method s, such as the double-SVM and the triplet-SVM , are also introduced in this chapter.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
- 2.
- 3.
- 4.
- 5.
- 6.
- 7.
- 8.
- 9.
- 10.
- 11.
- 12.
- 13.
- 14.
References
S. Andrews, I. Tsochantaridis, T. Hofmann, Support vector machines for multiple-instance learning, in Proceedings of Advances in Neural Information Processing Systems (2002), pp. 561–568
M.-F. Balcan, A. Blum, N. Srebro, A theory of learning with similarity functions. Mach. Learn. 72(1–2), 89–112 (2008)
M. Belkin, P. Niyogi, V. Sindhwani, Manifold regularization: a geometric framework for learning from labeled and unlabeled examples. J. Mach. Learn. Res. 7, 2399–2434 (2006)
A. Bellet, A. Habrard, M. Sebban, Good edit similarity learning by loss minimization. Mach. Learn. 89(1–2), 5–35 (2012)
A. Bellet, A. Habrard, M. Sebban, A survey on metric learning for feature vectors and structured data. arXiv preprint arXiv:1306.6709 (2002)
A. Bordes, L. Bottou, P. Gallinari, J. Weston, Solving multiclass support vector machines with LaRank, in Proceedings of the 24th International Conference on Machine Learning (ACM, 2007), pp. 89–96
C.-C. Chang, C.-J. Lin, LIBSVM: A library for support vector machines. ACM Trans. Intell. Syst. Technol. 2(3), 27 (2011)
R. Collobert, S. Bengio, Y. Bengio, A parallel mixture of SVMs for very large scale problems. Neural Comput. 14(5), 1105–1114 (2002)
I. Csisz, G. TUSN DY, Information geometry and alternating minimization procedures. Stat. Decis. 1, 205–237 (1984)
J.V. Davis, B. Kulis, P. Jain, S. Sra, I.S. Dhillon, Information-theoretic metric learning, in Proceedings of the 24th International Conference on Machine Learning (ACM, 2007), pp. 209–216
J. Demšar, Statistical comparisons of classifiers over multiple data sets. J. Mach. Learn. Res. 7, 1–30 (2006)
T. Evgeniou, M. Pontil, Regularized multi-task learning, in Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (ACM, 2004), pp. 109–117
A. Frank, A. Asuncion, UCI machine learning repository (2010). Available: http://archive.ics.uci.edu/ml
Y. Fu, S. Yan, T.S. Huang, Correlation metric for generalized feature extraction. IEEE Trans. Pattern Anal. Mach. Intell. 30(12), 2229–2235 (2008)
A. Globerson, S.T. Roweis, Metric learning by collapsing classes, in Proceedings of Advances in Neural Information Processing Systems (2005), pp. 451–458
J. Goldberger, G.E. Hinton, S.T. Roweis, R. Salakhutdinov, Neighbourhood components analysis, in Proceedings of Advances in Neural Information Processing Systems (2004), pp. 513–520
M. Guillaumin, J. Verbeek, C. Schmid, Is that you? Metric learning approaches for face identification, in Proceedings of IEEE International Conference on Computer Vision (IEEE, 2009), pp. 498–505
A. Gunawardana, W. Byrne, Convergence theorems for generalized alternating minimization procedures. J. Mach. Learn. Res. 6, 2049–2073 (2005)
C. Huang, S. Zhu, K. Yu, Large scale strongly supervised ensemble metric learning, with applications to face verification and retrieval. arXiv preprin arXiv:1212.6094 (2012)
G.B. Huang, M. Ramesh, T. Berg, E. Learned-Miller, Labeled faces in the wild: a database for studying face recognition in unconstrained environments. Technical Report 07-49 (University of Massachusetts, Amherst, 2007)
D. Kedem, S. Tyree, F. Sha, G.R. Lanckriet, K.Q. Weinberger, Non-linear metric learning, in Proceedings of Advances in Neural Information Processing Systems (2012), pp. 2573–2581
S.S. Keerthi, K. Duan, S.K. Shevade, A.N. Poo, A fast dual algorithm for kernel logistic regression. Mach. Learn. 61(1–3), 151–165 (2005)
K. Koh, S.-J. Kim, S.P. Boyd, An interior-point method for large-scale l1-regularized logistic regression. J. Mach. Learn. Res. 8(8), 1519–1555 (2007)
X. Li, C. Shen, Q. Shi, A. Dick, A. Van den Hengel, Non-sparse linear representations for visual tracking with online reservoir metric learning, in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (IEEE, 2012), pp. 1760–1767
K.-R. M Ller, S. MIKA, G. R TSCH, K. TSUDA, B. SCH LKOPF, An introduction to kernel-based learning algorithms. IEEE Trans. Neural Netw. 12(2), 181–201 (2001)
T. Mensink, J. Verbeek, F. Perronnin, G. Csurka, Metric learning for large scale image classification: generalizing to new classes at near-zero cost, in Proceedings of Computer Vision–ECCV (Springer, Berlin, 2012), pp. 488–501
J. Platt, Fast training of support vector machines using sequential minimal optimization. Adv. Kernel Methods Support Vector Learn. 3, 185–208 (1999)
B. Schlkopf, J.C. Platt, J. Shawe-Taylor, A.J. Smola, R.C. Williamson, Estimating the support of a high-dimensional distribution. Neural Comput. 13(7), 1443–1471 (2001)
S. Shalev-Shwartz, Y. Singer, N. Srebro, A. Cotter, Pegasos: primal estimated sub-gradient solver for SVM. Math. Program. 127(1), 3–30 (2011)
J. Shawe-Taylor, N. Cristianini, Kernel Methods for Pattern Analysis (Cambridge University Press, Cambridge, 2010)
C. Shen, J. Kim, L. Wang, A scalable dual approach to semidefinite metric learning, in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (IEEE, 2011), pp. 2601–2608
C. Shen, J. Kim, L. Wang, A. Hengel, Positive semidefinite metric learning with boosting, in Proceedings of Advances in Neural Information Processing Systems (2009), pp. 1651–1659
C.H. Teo, A. Smola, S. Vishwanathan, Q.V. Le, A scalable modular convex solver for regularized risk minimization, in Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (ACM, 2007), pp. 727–736
I.W. Tsang, A. Kocsor, J.T. Kwok, Simpler core vector machines with enclosing balls, in Proceedings of the 24th International Conference on Machine Learning (ACM, 2007), pp. 911–918
I.W. Tsang, J.T. Kwok, P.-M. Cheung, Core vector machines: fast SVM training on very large data sets. J. Mach. Learn. Res. 363–392 (2005)
V. Vapnik, The Nature of Statistical Learning Theory (Springer Science & Business Media, Berlin, 2013)
F. Wang, W. Zuo, L. Zhang, D. Meng, D. Zhang, A kernel classification framework for metric learning (2013)
J. Wang, H.T. Do, A. Woznica, A. Kalousis, Metric learning with multiple kernels, in Proceedings of Advances in Neural Information Processing Systems (2011), pp. 1170–1178
J. Wang, A. Kalousis, A. Woznica, Parametric local metric learning for nearest neighbor classification, in Proceedings of Advances in Neural Information Processing Systems (2012), pp. 1601–1609
K.Q. Weinberger, J. Blitzer, L.K. Saul, Distance metric learning for large margin nearest neighbor classification. J. Mach. Learn. Res. 10, 207–244 (2009)
L. Yang, R. Jin, Distance Metric Learning: A comprehensive Survey, vol. 2 (Michigan State University, 2006)
Y. Ying, P. Li, Distance metric learning with eigenvalue optimization. J. Mach. Learn. Res. 13(1), 1–26 (2012)
W. Zuo, F. Wang, D. Zhang, L. Lin, Y. Huang, D. Meng, L. Zhang, Iterated support vector machines for distance metric learning. arXiv preprint arXiv:1502.00363 (2015)
Author information
Authors and Affiliations
Corresponding authors
Appendices
Appendix 1: The Dual of PCML
The original problem of PCML is formulated as
Its Lagrangian is
where λ, κ, and Y are the Lagrange multiplier s that satisfy λ ij  ≥ 0, κ ij  ≥ 0, ∀i, j, and \( {\mathbf{Y}} \, \succcurlyeq \, 0 \). Based on the KKT conditions, the original problem can be converted into the dual problem . KKT conditions are defined as follows:
Equation (2.62) implies the relationship between \( {\lambda } \), Y and M as follows:
Substituting Eqs. (2.62)–(2.69) back into the Lagrangian , we get the following Lagrange dual problem of PCML:
From Eqs. (2.68) and (2.69), we can see that matrix M is explicitly determined by the training procedure, and b is not. Nevertheless, b can be easily obtained by using the KKT complementarity condition in Eqs. (2.64) and (2.67), which shows that ξ ij  = 0 if λ ij  < C, and \( h_{ij} \left( {\left\langle {{\mathbf{M}},{\mathbf{X}}_{ij} } \right\rangle + b} \right) - 1 + \xi_{ij} = 0 \) if λ ij  > 0. Thus, we can simply take any training point for which 0 < λ ij  < C to calculate b by
Note that it is reasonable to take the average of all such training points. After we obtained b, we can calculate ξ ij by
where term [z]+= max (z, 0) denotes the standard hinge loss .
Appendix 2: The Dual of NCML
The primal problem of NCML is as follows:
Its Lagrangian can be defined as
where β, σ, and ν are the Lagrange multiplier s that satisfy β ij  ≥ 0, σ ij  ≥ 0 and ν ij  ≥ 0, ∀i, j. Converting the original problem to its dual problem needs the following KKT conditions:
Here, we introduce a coefficient vector η, which satisfies \( \sigma_{ij} = \sum\limits_{k,l} {\eta_{kl} \left\langle {{\mathbf{X}}_{ij} ,{\mathbf{X}}_{kl} } \right\rangle } \), where \( \left\langle {{\mathbf{X}}_{ij} ,{\mathbf{X}}_{kl} } \right\rangle \) denotes a positive definite kernel . Thus, we can guarantee that every η has a unique corresponding \( \varvec{\sigma} \), and vice versa. According to Eq. (2.74), the relationship between α, β, and η is
Substituting Eqs. (2.74)–(2.76) back into the Lagrangian , the Lagrange dual problem of NCML can be rewritten as follows:
Analogous to PCML, we can use the KKT complementarity condition in Eq. (2.75) to compute b and ξ ij in NCML. Eqs. (2.76) and (2.79) show that ξ ij  = 0 if β ij < C, and \( h_{ij} \left( {\sum\nolimits_{kl} {\alpha_{kl} \left\langle {{\mathbf{X}}_{ij} ,{\mathbf{X}}_{kl} } \right\rangle } + b} \right) - 1 + \xi_{ij} = 0 \) if β ij > 0. With any training point for which 0 < β ij < C, b can be obtained by
Therefore, β ij can also be obtained by
where term [z]+ = max (z, 0) denotes the standard hinge loss .
Appendix 3: The Dual of the Subproblem of η in NCML
The subproblem of η is defined as follows:
where \( \gamma_{ij} = \sum\limits_{k,l} {\beta_{kl} h_{kl} \left\langle {{\mathbf{X}}_{ij} ,{\mathbf{X}}_{kl} } \right\rangle } \). Its Lagrangian is
where μ is the Lagrange multiplier that satisfies μ ij  ≥ 0, ∀i, j. Converting the original problem to its dual problem needs the following KKT condition:
According to Eq. (2.86), the relationship between μ, η and β is
Substituting Eqs. (2.86) and (2.87) back into the Lagrangian , we get the following Lagrange dual problem of the subproblem of η
Since β is fixed in this subproblem, \( \sum\nolimits_{i,j} {\sum\nolimits_{k,l} {\beta_{ij} \beta_{kl} h_{ij} h_{kl} \left\langle {{\mathbf{X}}_{ij} ,{\mathbf{X}}_{kl} } \right\rangle } } \) remains constant in Eq. (2.88). Thus, we can omit this term and derive the simplified Lagrange dual problem as follows:
Appendix 4: The Dual of the Doublet-SVM
According to the original problem of the doublet-SVM defined in Eq. (2.56), its Lagrange function can be defined as follows
where α and β are the Lagrange multiplier s that satisfy α l  ≥ 0 and β l  ≥ 0, ∀ l. To convert the original problem to its dual needs the following KKT conditions:
According to Eq. (2.91), the relationship between M and α is
Substituting Eqs. (2.91)–(2.93) back into the Lagrangian function, we have
Thus, the dual problem of the doublet-SVM can be formulated as follows:
Appendix 5: The Dual of the Triplet-SVM
According to the original problem of the triplet-SVM in Eq. (2.58), its Lagrange function can be defined as follows:
where α and β are the Lagrange multiplier s. To convert the original problem to its dual, we let the derivative of the Lagrangian function requires the following KKT conditions:
According to Eq. (2.98), the relationship between M and \( {\alpha } \) is:
Substituting Eqs. (2.98) and (2.99) back into the Lagrangian , we get
Thus, the dual problem of the triplet-SVM can be rewritten as follows:
Rights and permissions
Copyright information
© 2016 Springer Science+Business Media Singapore
About this chapter
Cite this chapter
Zhang, D., Xu, Y., Zuo, W. (2016). Metric Learning with Biometric Applications. In: Discriminative Learning in Biometrics. Springer, Singapore. https://doi.org/10.1007/978-981-10-2056-8_2
Download citation
DOI: https://doi.org/10.1007/978-981-10-2056-8_2
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-10-2055-1
Online ISBN: 978-981-10-2056-8
eBook Packages: Computer ScienceComputer Science (R0)