Metric Learning with Biometric Applications

Zhang, David; Xu, Yong; Zuo, Wangmeng

doi:10.1007/978-981-10-2056-8_2

David Zhang⁴,
Yong Xu⁵ &
Wangmeng Zuo⁶

399 Accesses

Abstract

Learning a desired distance metric from given training samples plays a significant role in the field of machine learning. In this chapter, we first present two novel metric learning methods based on a support vector machine (SVM). We then present a kernel classification framework for metric learning that can be implemented efficiently by using the standard SVM solvers. Some novel kernel metric learning method s, such as the double-SVM and the triplet-SVM , are also introduced in this chapter.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

References

S. Andrews, I. Tsochantaridis, T. Hofmann, Support vector machines for multiple-instance learning, in Proceedings of Advances in Neural Information Processing Systems (2002), pp. 561–568
Google Scholar
M.-F. Balcan, A. Blum, N. Srebro, A theory of learning with similarity functions. Mach. Learn. 72(1–2), 89–112 (2008)
Article Google Scholar
M. Belkin, P. Niyogi, V. Sindhwani, Manifold regularization: a geometric framework for learning from labeled and unlabeled examples. J. Mach. Learn. Res. 7, 2399–2434 (2006)
MathSciNet MATH Google Scholar
A. Bellet, A. Habrard, M. Sebban, Good edit similarity learning by loss minimization. Mach. Learn. 89(1–2), 5–35 (2012)
Article MathSciNet MATH Google Scholar
A. Bellet, A. Habrard, M. Sebban, A survey on metric learning for feature vectors and structured data. arXiv preprint arXiv:1306.6709 (2002)
A. Bordes, L. Bottou, P. Gallinari, J. Weston, Solving multiclass support vector machines with LaRank, in Proceedings of the 24th International Conference on Machine Learning (ACM, 2007), pp. 89–96
Google Scholar
C.-C. Chang, C.-J. Lin, LIBSVM: A library for support vector machines. ACM Trans. Intell. Syst. Technol. 2(3), 27 (2011)
Article Google Scholar
R. Collobert, S. Bengio, Y. Bengio, A parallel mixture of SVMs for very large scale problems. Neural Comput. 14(5), 1105–1114 (2002)
Article MATH Google Scholar
I. Csisz, G. TUSN DY, Information geometry and alternating minimization procedures. Stat. Decis. 1, 205–237 (1984)
MathSciNet Google Scholar
J.V. Davis, B. Kulis, P. Jain, S. Sra, I.S. Dhillon, Information-theoretic metric learning, in Proceedings of the 24th International Conference on Machine Learning (ACM, 2007), pp. 209–216
Google Scholar
J. Demšar, Statistical comparisons of classifiers over multiple data sets. J. Mach. Learn. Res. 7, 1–30 (2006)
MathSciNet MATH Google Scholar
T. Evgeniou, M. Pontil, Regularized multi-task learning, in Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (ACM, 2004), pp. 109–117
Google Scholar
A. Frank, A. Asuncion, UCI machine learning repository (2010). Available: http://archive.ics.uci.edu/ml
Y. Fu, S. Yan, T.S. Huang, Correlation metric for generalized feature extraction. IEEE Trans. Pattern Anal. Mach. Intell. 30(12), 2229–2235 (2008)
Article Google Scholar
A. Globerson, S.T. Roweis, Metric learning by collapsing classes, in Proceedings of Advances in Neural Information Processing Systems (2005), pp. 451–458
Google Scholar
J. Goldberger, G.E. Hinton, S.T. Roweis, R. Salakhutdinov, Neighbourhood components analysis, in Proceedings of Advances in Neural Information Processing Systems (2004), pp. 513–520
Google Scholar
M. Guillaumin, J. Verbeek, C. Schmid, Is that you? Metric learning approaches for face identification, in Proceedings of IEEE International Conference on Computer Vision (IEEE, 2009), pp. 498–505
Google Scholar
A. Gunawardana, W. Byrne, Convergence theorems for generalized alternating minimization procedures. J. Mach. Learn. Res. 6, 2049–2073 (2005)
MathSciNet MATH Google Scholar
C. Huang, S. Zhu, K. Yu, Large scale strongly supervised ensemble metric learning, with applications to face verification and retrieval. arXiv preprin arXiv:1212.6094 (2012)
G.B. Huang, M. Ramesh, T. Berg, E. Learned-Miller, Labeled faces in the wild: a database for studying face recognition in unconstrained environments. Technical Report 07-49 (University of Massachusetts, Amherst, 2007)
Google Scholar
D. Kedem, S. Tyree, F. Sha, G.R. Lanckriet, K.Q. Weinberger, Non-linear metric learning, in Proceedings of Advances in Neural Information Processing Systems (2012), pp. 2573–2581
Google Scholar
S.S. Keerthi, K. Duan, S.K. Shevade, A.N. Poo, A fast dual algorithm for kernel logistic regression. Mach. Learn. 61(1–3), 151–165 (2005)
Article MATH Google Scholar
K. Koh, S.-J. Kim, S.P. Boyd, An interior-point method for large-scale l1-regularized logistic regression. J. Mach. Learn. Res. 8(8), 1519–1555 (2007)
MathSciNet MATH Google Scholar
X. Li, C. Shen, Q. Shi, A. Dick, A. Van den Hengel, Non-sparse linear representations for visual tracking with online reservoir metric learning, in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (IEEE, 2012), pp. 1760–1767
Google Scholar
K.-R. M Ller, S. MIKA, G. R TSCH, K. TSUDA, B. SCH LKOPF, An introduction to kernel-based learning algorithms. IEEE Trans. Neural Netw. 12(2), 181–201 (2001)
Article Google Scholar
T. Mensink, J. Verbeek, F. Perronnin, G. Csurka, Metric learning for large scale image classification: generalizing to new classes at near-zero cost, in Proceedings of Computer Vision–ECCV (Springer, Berlin, 2012), pp. 488–501
Google Scholar
J. Platt, Fast training of support vector machines using sequential minimal optimization. Adv. Kernel Methods Support Vector Learn. 3, 185–208 (1999)
Google Scholar
B. Schlkopf, J.C. Platt, J. Shawe-Taylor, A.J. Smola, R.C. Williamson, Estimating the support of a high-dimensional distribution. Neural Comput. 13(7), 1443–1471 (2001)
Article MATH Google Scholar
S. Shalev-Shwartz, Y. Singer, N. Srebro, A. Cotter, Pegasos: primal estimated sub-gradient solver for SVM. Math. Program. 127(1), 3–30 (2011)
Article MathSciNet MATH Google Scholar
J. Shawe-Taylor, N. Cristianini, Kernel Methods for Pattern Analysis (Cambridge University Press, Cambridge, 2010)
Google Scholar
C. Shen, J. Kim, L. Wang, A scalable dual approach to semidefinite metric learning, in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (IEEE, 2011), pp. 2601–2608
Google Scholar
C. Shen, J. Kim, L. Wang, A. Hengel, Positive semidefinite metric learning with boosting, in Proceedings of Advances in Neural Information Processing Systems (2009), pp. 1651–1659
Google Scholar
C.H. Teo, A. Smola, S. Vishwanathan, Q.V. Le, A scalable modular convex solver for regularized risk minimization, in Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (ACM, 2007), pp. 727–736
Google Scholar
I.W. Tsang, A. Kocsor, J.T. Kwok, Simpler core vector machines with enclosing balls, in Proceedings of the 24th International Conference on Machine Learning (ACM, 2007), pp. 911–918
Google Scholar
I.W. Tsang, J.T. Kwok, P.-M. Cheung, Core vector machines: fast SVM training on very large data sets. J. Mach. Learn. Res. 363–392 (2005)
Google Scholar
V. Vapnik, The Nature of Statistical Learning Theory (Springer Science & Business Media, Berlin, 2013)
Google Scholar
F. Wang, W. Zuo, L. Zhang, D. Meng, D. Zhang, A kernel classification framework for metric learning (2013)
Google Scholar
J. Wang, H.T. Do, A. Woznica, A. Kalousis, Metric learning with multiple kernels, in Proceedings of Advances in Neural Information Processing Systems (2011), pp. 1170–1178
Google Scholar
J. Wang, A. Kalousis, A. Woznica, Parametric local metric learning for nearest neighbor classification, in Proceedings of Advances in Neural Information Processing Systems (2012), pp. 1601–1609
Google Scholar
K.Q. Weinberger, J. Blitzer, L.K. Saul, Distance metric learning for large margin nearest neighbor classification. J. Mach. Learn. Res. 10, 207–244 (2009)
MATH Google Scholar
L. Yang, R. Jin, Distance Metric Learning: A comprehensive Survey, vol. 2 (Michigan State University, 2006)
Google Scholar
Y. Ying, P. Li, Distance metric learning with eigenvalue optimization. J. Mach. Learn. Res. 13(1), 1–26 (2012)
MathSciNet MATH Google Scholar
W. Zuo, F. Wang, D. Zhang, L. Lin, Y. Huang, D. Meng, L. Zhang, Iterated support vector machines for distance metric learning. arXiv preprint arXiv:1502.00363 (2015)

Download references

Author information

Authors and Affiliations

Department of Computing, The Hong Kong Polytechnic University, Kowloon, Hong Kong, China
David Zhang
Shenzhen Graduate School, Harbin Institute of Technology, Shenzhen, Guangdong, China
Yong Xu
Harbin Institute of Technology, Harbin, Heilongjiang, China
Wangmeng Zuo

Authors

David Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Yong Xu
View author publications
You can also search for this author in PubMed Google Scholar
Wangmeng Zuo
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to David Zhang , Yong Xu or Wangmeng Zuo .

Appendices

Appendix 1: The Dual of PCML

The original problem of PCML is formulated as

$$ \begin{aligned} & \mathop {\hbox{min} }\limits_{{{\mathbf{M}},b,{\xi }}} \quad \frac{1}{2}\left\| {\mathbf{M}} \right\|_{F}^{2} + C\sum\limits_{i,j} {\xi_{ij} } \hfill \\ & {\text{s}} . {\text{t}} .\quad h_{ij} \left( {\left\langle {{\mathbf{M}},{\mathbf{X}}_{ij} } \right\rangle + b} \right) \ge 1 - \xi_{ij} ,\xi_{ij} \ge 0,\;\forall i,j,{\mathbf{M}} \, \succcurlyeq \, 0. \hfill \\ \end{aligned} $$

(2.60)

Its Lagrangian is

$$ \begin{aligned} L\left( {{\lambda },{\kappa },{\mathbf{Y}},{\mathbf{M}},b,{\xi }} \right) & = \frac{1}{2}\left\| {\mathbf{M}} \right\|_{F}^{2} + C\sum\limits_{i,j} {\xi_{ij} } - \sum\limits_{i,j} {\lambda_{ij} \left[ {h_{ij} \left( {\left\langle {{\mathbf{M}},{\mathbf{X}}_{ij} } \right\rangle + b} \right) - 1 + \xi_{ij} } \right]} \\ & \quad - \sum\limits_{i,j} {\kappa_{ij} \xi_{ij} } - \left\langle {{\mathbf{Y}},{\mathbf{M}}} \right\rangle, \\ \end{aligned} $$

(2.61)

where λ, κ, and Y are the Lagrange multiplier s that satisfy λ _ij ≥ 0, κ _ij ≥ 0, ∀i, j, and $ {\mathbf{Y}} \, \succcurlyeq \, 0 $. Based on the KKT conditions, the original problem can be converted into the dual problem . KKT conditions are defined as follows:

$$ \frac{{\partial L\left( {{\lambda },{\kappa },{\mathbf{Y}},{\mathbf{M}},b,{\xi }} \right)}}{{\partial {\mathbf{M}}}} = {0} \Rightarrow {\mathbf{M}} - \sum\limits_{i,j} {\lambda_{ij} h_{ij} {\mathbf{X}}_{ij} } - {\mathbf{Y}} = {0}, $$

(2.62)

$$ \frac{{\partial L\left( {{\lambda },{\kappa },{\mathbf{Y}},{\mathbf{M}},b,{\xi }} \right)}}{\partial b} = 0 \Rightarrow \sum\limits_{i,j} {\lambda_{ij} h_{ij} } = 0, $$

(2.63)

$$ \frac{{\partial L\left( {{\lambda },{\kappa },{\mathbf{Y}},{\mathbf{M}},b,{\xi }} \right)}}{{\partial \xi_{ij} }} = C - \lambda_{ij} - \kappa_{ij} = 0 \Rightarrow 0 \le \lambda_{ij} \le C,\;\forall i,j, $$

(2.64)

$$ h_{ij} \left( {\left\langle {{\mathbf{M}},{\mathbf{X}}_{ij} } \right\rangle + b} \right) - 1 + \xi_{ij} \ge 0,\quad \xi_{ij} \ge 0, $$

(2.65)

$$ \lambda_{ij} \ge 0,\quad \kappa_{ij} \ge 0,\quad {\mathbf{Y}} \, \succcurlyeq \, 0, $$

(2.66)

$$ \lambda_{ij} \left[ {h_{ij} \left( {\left\langle {{\mathbf{M}},{\mathbf{X}}_{ij} } \right\rangle + b} \right) - 1 + \xi_{ij} } \right] = 0,\;\kappa_{ij} \xi_{ij} = 0. $$

(2.67)

Equation (2.62) implies the relationship between $ {\lambda } $, Y and M as follows:

$$ {\mathbf{M}} = \sum\limits_{i,j} {\lambda_{ij} h_{ij} {\mathbf{X}}_{ij} } + {\mathbf{Y}}. $$

(2.68)

Substituting Eqs. (2.62)–(2.69) back into the Lagrangian , we get the following Lagrange dual problem of PCML:

$$ \begin{aligned} & \mathop {\hbox{max} }\limits_{{{\lambda },{\mathbf{Y}}}} \quad - \frac{1}{2}\left\| {\sum\limits_{i,j} {\lambda_{ij} h_{ij} {\mathbf{X}}_{ij} } + {\mathbf{Y}}} \right\|_{F}^{2} + \sum\limits_{i,j} {\lambda_{ij} } \\ & {\text{s}} . {\text{t}} .\quad \sum\limits_{i,j} {\lambda_{ij} h_{ij} } = 0,0 \le \lambda_{ij} \le C,{\mathbf{Y}} \, \succcurlyeq \, 0. \\ \end{aligned} $$

(2.69)

From Eqs. (2.68) and (2.69), we can see that matrix M is explicitly determined by the training procedure, and b is not. Nevertheless, b can be easily obtained by using the KKT complementarity condition in Eqs. (2.64) and (2.67), which shows that ξ _ij = 0 if λ _ij < C, and $ h_{ij} \left( {\left\langle {{\mathbf{M}},{\mathbf{X}}_{ij} } \right\rangle + b} \right) - 1 + \xi_{ij} = 0 $ if λ _ij > 0. Thus, we can simply take any training point for which 0 < λ _ij < C to calculate b by

$$ b = \frac{1}{{h_{ij} }} - \left\langle {{\mathbf{M}},{\mathbf{X}}_{ij} } \right\rangle ,\quad {\text{for}}\;{\text{all}}\;\;0 < \lambda_{ij} < C s. $$

(2.70)

Note that it is reasonable to take the average of all such training points. After we obtained b, we can calculate ξ _ij by

$$ \xi_{ij} = \left\{ {\begin{array}{*{20}l} {0\quad {\text{for}}\;{\text{all}}\;\lambda_{ij} < C} \hfill \\ {\left[ {1 - h_{ij} \left( {\left\langle {{\mathbf{M}},{\mathbf{X}}_{ij} } \right\rangle + b} \right)} \right]_{ + } \quad {\text{for}}\;{\text{all}}\;\lambda_{ij} = C}, \hfill \\ \end{array} } \right. $$

(2.71)

where term [z]₊= max (z, 0) denotes the standard hinge loss .

Appendix 2: The Dual of NCML

The primal problem of NCML is as follows:

$$ \begin{aligned} & \mathop {\hbox{min} }\limits_{{{\alpha },b,{\xi }}} \quad \frac{1}{2}\sum\limits_{i,j} {\sum\limits_{k,l} {\alpha_{ij} \alpha_{kl} \left\langle {{\mathbf{X}}_{ij} ,{\mathbf{X}}_{kl} } \right\rangle } } + C\sum\limits_{i,j} {\xi_{ij} } \\ & {\text{s}} . {\text{t}} .\quad h_{ij} \left( {\sum\limits_{k,l} {\alpha_{kl} \left\langle {{\mathbf{X}}_{ij} ,{\mathbf{X}}_{kl} } \right\rangle } + b} \right) \ge 1 - \xi_{ij} ,\quad \xi_{ij} \ge 0, \;\;\alpha_{ij} \ge 0,\;\;\forall i,j. \\ \end{aligned} $$

(2.72)

Its Lagrangian can be defined as

$$ \begin{aligned} L\left( {{\beta },{\sigma },{\nu },{\alpha },b,{\xi }} \right) & = \frac{1}{2}\sum\limits_{i,j} {\sum\limits_{k,l} {\alpha_{ij} \alpha_{kl} \left\langle {{\mathbf{X}}_{ij} ,{\mathbf{X}}_{kl} } \right\rangle } } + C\sum\limits_{i,j} {\xi_{ij} } - \sum\limits_{i,j} {\sigma_{ij} \alpha_{ij} } \\ & \quad - \sum\limits_{i,j} {\beta_{ij} \left[ {h_{ij} \left( {\sum\nolimits_{kl} {\alpha_{kl} \left\langle {{\mathbf{X}}_{ij} ,{\mathbf{X}}_{kl} } \right\rangle } + b} \right) - 1 + \xi_{ij} } \right]} - \sum\limits_{i,j} {\nu_{ij} \xi_{ij} }, \\ \end{aligned} $$

(2.73)

where β, σ, and ν are the Lagrange multiplier s that satisfy β _ij ≥ 0, σ _ij ≥ 0 and ν _ij ≥ 0, ∀i, j. Converting the original problem to its dual problem needs the following KKT conditions:

$$ \frac{{\partial L\left( {{\beta },{\sigma },{\nu },{\alpha },b,{\xi }} \right)}}{{\partial \alpha_{ij} }} = 0 \Rightarrow \sum\limits_{k,l} {\alpha_{kl} \left\langle {{\mathbf{X}}_{ij} ,{\mathbf{X}}_{kl} } \right\rangle } - \sum\limits_{k,l} {\beta_{kl} h_{kl} \left\langle {{\mathbf{X}}_{ij} ,{\mathbf{X}}_{kl} } \right\rangle } - \sigma_{ij} = 0, $$

(2.74)

$$ \frac{{\partial L\left( {{\beta },{\sigma },{\nu },{\alpha },b,{\xi }} \right)}}{\partial b} = 0 \Rightarrow \sum\limits_{i,j} {\beta_{ij} h_{ij} } = 0, $$

(2.75)

$$ \frac{{\partial L\left( {{\beta },{\sigma },{\nu },{\alpha },b,{\xi }} \right)}}{{\partial \xi_{ij} }} = 0 \Rightarrow C - \beta_{ij} - \nu_{ij} = 0 \Rightarrow 0 \le \beta_{ij} \le C, $$

(2.76)

$$ h_{ij} \left( {\sum\limits_{k,l} {\alpha_{kl} \left\langle {{\mathbf{X}}_{ij} ,{\mathbf{X}}_{kl} } \right\rangle } + b} \right) - 1 + \xi_{ij} \ge 0,\quad \xi_{ij} \ge 0,\;\;\alpha_{ij} \ge 0,\;\;\forall i,j, $$

(2.77)

$$ \beta_{ij} \ge 0,\;\sigma_{ij} \ge 0,\;\nu_{ij} \ge 0,\quad \forall i,j, $$

(2.78)

$$ \beta_{ij} \left[ {h_{ij} \left( {\sum\limits_{kl} {\alpha_{kl} \left\langle {{\mathbf{X}}_{ij} ,{\mathbf{X}}_{kl} } \right\rangle } + b} \right) - 1 + \xi_{ij} } \right] = 0,\quad \nu_{ij} \xi_{ij} = 0,\;\;\sigma_{ij} \alpha_{ij} = 0,\;\; \forall i,j. $$

(2.79)

Here, we introduce a coefficient vector η, which satisfies $ \sigma_{ij} = \sum\limits_{k,l} {\eta_{kl} \left\langle {{\mathbf{X}}_{ij} ,{\mathbf{X}}_{kl} } \right\rangle } $, where $ \left\langle {{\mathbf{X}}_{ij} ,{\mathbf{X}}_{kl} } \right\rangle $ denotes a positive definite kernel . Thus, we can guarantee that every η has a unique corresponding $ \varvec{\sigma} $, and vice versa. According to Eq. (2.74), the relationship between α, β, and η is

$$ \alpha_{ij} = \beta_{ij} h_{ij} + \eta_{ij} ,\quad \forall i,j. $$

(2.80)

Substituting Eqs. (2.74)–(2.76) back into the Lagrangian , the Lagrange dual problem of NCML can be rewritten as follows:

$$ \begin{aligned} & \mathop {\hbox{max} }\limits_{{{\eta },{\beta }}} \quad - \frac{1}{2}\sum\limits_{i,j} {\sum\limits_{k,l} {\left( {\beta_{ij} h_{ij} + \eta_{ij} } \right)\left( {\beta_{kl} h_{kl} + \eta_{kl} } \right)\left\langle {{\mathbf{X}}_{ij} ,{\mathbf{X}}_{kl} } \right\rangle } } + \sum\limits_{i,j} {\beta_{ij} } \\ & {\text{s}} . {\text{t}} .\quad \sum\limits_{k,l} {\eta_{kl} \left\langle {{\mathbf{X}}_{ij} ,{\mathbf{X}}_{kl} } \right\rangle } \ge 0, \quad 0 \le \beta_{ij} \le C, \;\; \forall i,j, \quad \sum\limits_{i,j} {\beta_{ij} h_{ij} } = 0. \\ \end{aligned} $$

(2.81)

Analogous to PCML, we can use the KKT complementarity condition in Eq. (2.75) to compute b and ξ _ij in NCML. Eqs. (2.76) and (2.79) show that ξ _ij = 0 if β _ij < C, and $ h_{ij} \left( {\sum\nolimits_{kl} {\alpha_{kl} \left\langle {{\mathbf{X}}_{ij} ,{\mathbf{X}}_{kl} } \right\rangle } + b} \right) - 1 + \xi_{ij} = 0 $ if β _ij > 0. With any training point for which 0 < β _ij < C, b can be obtained by

$$ b = \frac{1}{{h_{ij} }} - \sum\limits_{kl} {\alpha_{kl} \left\langle {{\mathbf{X}}_{ij} ,{\mathbf{X}}_{kl} } \right\rangle }. $$

(2.82)

Therefore, β _ij can also be obtained by

$$ \xi_{ij} = \left\{ {\begin{array}{*{20}l} {0\quad {\text{for}}\;{\text{all}}\;\beta_{ij} < C} \hfill \\ {\left[ {1 - h_{ij} \left( {\sum\limits_{k,l} {\alpha_{kl} \left\langle {{\mathbf{X}}_{ij} ,{\mathbf{X}}_{kl} } \right\rangle } + b} \right)} \right]_{ + } \quad {\text{for}}\;{\text{all}}\;\beta_{ij} = C}, \hfill \\ \end{array} } \right. $$

(2.83)

where term [z]₊ = max (z, 0) denotes the standard hinge loss .

Appendix 3: The Dual of the Subproblem of η in NCML

The subproblem of η is defined as follows:

$$ \begin{aligned} & \mathop {\hbox{min} }\limits_{\eta } \quad \frac{1}{2}\sum\limits_{i,j} {\sum\limits_{k,l} {\eta_{ij} \eta_{kl} \left\langle {{\mathbf{X}}_{ij} ,{\mathbf{X}}_{kl} } \right\rangle } } + \sum\limits_{i,j} {\eta_{ij} \gamma_{ij} } \\ & {\text{s}} . {\text{t}} .\quad \sum\limits_{k,l} {\eta_{kl} \left\langle {{\mathbf{X}}_{ij} ,{\mathbf{X}}_{kl} } \right\rangle } \ge 0,\quad \forall i,j, \\ \end{aligned} $$

(2.84)

where $ \gamma_{ij} = \sum\limits_{k,l} {\beta_{kl} h_{kl} \left\langle {{\mathbf{X}}_{ij} ,{\mathbf{X}}_{kl} } \right\rangle } $. Its Lagrangian is

$$ L\left( {{\mu },{\eta }} \right) = \frac{1}{2}\sum\limits_{i,j} {\sum\limits_{k,l} {\eta_{ij} \eta_{kl} \left\langle {{\mathbf{X}}_{ij} ,{\mathbf{X}}_{kl} } \right\rangle } } + \sum\limits_{i,j} {\eta_{ij} \gamma_{ij} } - \sum\limits_{i,j} {\mu_{ij} \sum\limits_{k,l} {\eta_{kl} \left\langle {{\mathbf{X}}_{ij} ,{\mathbf{X}}_{kl} } \right\rangle } }, $$

(2.85)

where μ is the Lagrange multiplier that satisfies μ _ij ≥ 0, ∀i, j. Converting the original problem to its dual problem needs the following KKT condition:

$$ \frac{{\partial L\left( {{\mu },{\eta }} \right)}}{{\partial \eta_{ij} }} = 0 \Rightarrow \sum\limits_{k,l} {\eta_{kl} \left\langle {{\mathbf{X}}_{ij} ,{\mathbf{X}}_{kl} } \right\rangle } + \gamma_{ij} - \sum\limits_{k,l} {\mu_{kl} \left\langle {{\mathbf{X}}_{ij} ,{\mathbf{X}}_{kl} } \right\rangle } = 0. $$

(2.86)

According to Eq. (2.86), the relationship between μ, η and β is

$$ \eta_{ij} = \mu_{ij} - h_{ij} \beta_{ij} ,\quad \forall i,j. $$

(2.87)

Substituting Eqs. (2.86) and (2.87) back into the Lagrangian , we get the following Lagrange dual problem of the subproblem of η

$$ \begin{aligned} & \mathop {\hbox{max} }\limits_{\mu } \quad - \frac{1}{2}\sum\limits_{i,j} {\sum\limits_{k,l} {\mu_{ij} \mu_{kl} \left\langle {{\mathbf{X}}_{ij} ,{\mathbf{X}}_{kl} } \right\rangle } } + \sum\limits_{i,j} {\gamma_{ij} \mu_{ij} } \\ & \quad - \frac{1}{2}\sum\limits_{i,j} {\sum\limits_{k,l} {\beta_{ij} \beta_{kl} h_{ij} h_{kl} \left\langle {{\mathbf{X}}_{ij} ,{\mathbf{X}}_{kl} } \right\rangle } } \\ & \quad {\text{s}} . {\text{t}} .\quad \mu_{ij} \ge 0,\forall i,j. \\ \end{aligned} $$

(2.88)

Since β is fixed in this subproblem, $ \sum\nolimits_{i,j} {\sum\nolimits_{k,l} {\beta_{ij} \beta_{kl} h_{ij} h_{kl} \left\langle {{\mathbf{X}}_{ij} ,{\mathbf{X}}_{kl} } \right\rangle } } $ remains constant in Eq. (2.88). Thus, we can omit this term and derive the simplified Lagrange dual problem as follows:

$$ \begin{aligned} & \mathop {\hbox{max} }\limits_{\mu } \quad - \frac{1}{2}\sum\limits_{i,j} {\sum\limits_{k,l} {\mu_{ij} \mu_{kl} \left\langle {{\mathbf{X}}_{ij} ,{\mathbf{X}}_{kl} } \right\rangle } } + \sum\limits_{i,j} {\gamma_{ij} \mu_{ij} } \\ & {\text{s}} . {\text{t}} .\quad \mu_{ij} \ge 0, \quad \forall i,j. \\ \end{aligned} $$

(2.89)

Appendix 4: The Dual of the Doublet-SVM

According to the original problem of the doublet-SVM defined in Eq. (2.56), its Lagrange function can be defined as follows

$$ \begin{aligned} L\left( {{\mathbf{M}},b,{\xi },{\alpha },{\beta }} \right) & = \frac{1}{2}\left\| {\mathbf{M}} \right\|_{F}^{2} + C\sum\limits_{l} {\xi_{l} } \\ & \quad - \sum\limits_{l} {\alpha_{l} \left[ {h_{l} \left( {({\mathbf{x}}_{l,1} - {\mathbf{x}}_{l,2} )^{\text{T}} {\mathbf{M}}({\mathbf{x}}_{l,1} - {\mathbf{x}}_{l,2} ) + b} \right) - 1 + \xi_{l} } \right]} - \sum\limits_{l} {\beta_{l} \xi_{l} }, \\ \end{aligned} $$

(2.90)

where α and β are the Lagrange multiplier s that satisfy α _l ≥ 0 and β _l ≥ 0, ∀ l. To convert the original problem to its dual needs the following KKT conditions:

$$ \frac{{\partial L\left( {{\mathbf{M}},b,{\xi },{\alpha },{\beta }} \right)}}{{\partial {\mathbf{M}}}} = {0} \Rightarrow {\mathbf{M}} - \sum\limits_{l} {\alpha_{l} h_{l} \left( {{\mathbf{x}}_{l,1} - {\mathbf{x}}_{l,2} } \right)\left( {{\mathbf{x}}_{l,1} - {\mathbf{x}}_{l,2} } \right)^{\text{T}} } = {0}, $$

(2.91)

$$ \frac{{\partial L\left( {{\mathbf{M}},b,{\xi },{\alpha },{\beta }} \right)}}{\partial b} = 0 \Rightarrow \sum\limits_{l} {\alpha_{l} h_{l} } = 0, $$

(2.92)

$$ \frac{{\partial L\left( {{\mathbf{M}},b,{\xi },{\alpha },{\beta }} \right)}}{{\partial \xi_{l} }} = 0 \Rightarrow C - \alpha_{l} - \beta_{l} = 0 \Rightarrow 0 < \alpha_{l} < C,\quad \forall l. $$

(2.93)

According to Eq. (2.91), the relationship between M and α is

$$ {\mathbf{M}} = \sum\limits_{l} {\alpha_{l} h_{l} \left( {{\mathbf{x}}_{l,1} - {\mathbf{x}}_{l,2} } \right)\left( {{\mathbf{x}}_{l,1} - {\mathbf{x}}_{l,2} } \right)^{\text{T}} }. $$

(2.94)

Substituting Eqs. (2.91)–(2.93) back into the Lagrangian function, we have

$$ L\left( {\alpha } \right) = - \frac{1}{2}\sum\limits_{i,j} {\alpha_{i} \alpha_{j} h_{i} h_{j} K_{p} \left( {{\mathbf{z}}_{i} ,{\mathbf{z}}_{j} } \right)} + \sum\limits_{i} {\alpha_{i} }. $$

(2.95)

Thus, the dual problem of the doublet-SVM can be formulated as follows:

$$ \begin{aligned} & \mathop {\hbox{max} }\limits_{\alpha } \, - \frac{1}{2}\sum\limits_{i,j} {\alpha_{i} \alpha_{j} h_{i} h_{j} K_{p} \left( {{\mathbf{z}}_{i} ,{\mathbf{z}}_{j} } \right)} + \sum\limits_{i} {\alpha_{i} } \\ & {\text{s}} . {\text{t}} .\quad 0 \le \alpha_{l} \le C, \quad \sum\limits_{l} {\alpha_{l} h_{l} } = 0,\quad \forall l. \\ \end{aligned} $$

(2.96)

Appendix 5: The Dual of the Triplet-SVM

According to the original problem of the triplet-SVM in Eq. (2.58), its Lagrange function can be defined as follows:

$$ \begin{aligned} L\left( {{\mathbf{M}},{\xi },{\alpha },{\beta }} \right) & = \frac{1}{2}\left\| {\mathbf{M}} \right\|_{F}^{2} + C\sum\limits_{l} {\xi_{l} } - \sum\limits_{l} {\alpha_{l} [} ({\mathbf{x}}_{l,1} - {\mathbf{x}}_{l,3} )^{\text{T}} {\mathbf{M}}({\mathbf{x}}_{l,1} - {\mathbf{x}}_{l,3} ) \\ & \quad - ({\mathbf{x}}_{l,1} - {\mathbf{x}}_{l,2} )^{\text{T}} {\mathbf{M}}({\mathbf{x}}_{l,1} - {\mathbf{x}}_{l,2} )] + \sum\limits_{l} {\alpha_{l} } - \sum\limits_{l} {\alpha_{l} \xi_{l} } - \sum\limits_{l} {\beta_{l} \xi_{l} }, \\ \end{aligned} $$

(2.97)

where α and β are the Lagrange multiplier s. To convert the original problem to its dual, we let the derivative of the Lagrangian function requires the following KKT conditions:

$$ \begin{aligned} & \frac{{\partial L\left( {{\mathbf{M}},b,{\xi },{\alpha },{\beta }} \right)}}{{\partial {\mathbf{M}}}} = {0} \Rightarrow \\ & {\mathbf{M}} - \sum\limits_{l} {\alpha_{l} \left[ {\left( {{\mathbf{x}}_{l,1} - {\mathbf{x}}_{l,3} } \right)\left( {{\mathbf{x}}_{l,1} - {\mathbf{x}}_{l,3} } \right)^{\text{T}} - \left( {{\mathbf{x}}_{l,1} - {\mathbf{x}}_{l,2} } \right)\left( {{\mathbf{x}}_{l,1} - {\mathbf{x}}_{l,2} } \right)^{\text{T}} } \right]} = {0}, \\ \end{aligned} $$

(2.98)

$$ \frac{{\partial L\left( {{\mathbf{M}},b,{\xi },{\alpha },{\beta }} \right)}}{{\partial \xi_{l} }} = 0 \Rightarrow C - \alpha_{l} - \beta_{l} = 0,\;\forall l. $$

(2.99)

According to Eq. (2.98), the relationship between M and $ {\alpha } $ is:

$$ {\mathbf{M}} = \sum\limits_{l} {\alpha_{l} \left[ {\left( {{\mathbf{x}}_{l,1} - {\mathbf{x}}_{l,3} } \right)\left( {{\mathbf{x}}_{l,1} - {\mathbf{x}}_{l,3} } \right)^{\text{T}} - \left( {{\mathbf{x}}_{l,1} - {\mathbf{x}}_{l,2} } \right)\left( {{\mathbf{x}}_{l,1} - {\mathbf{x}}_{l,2} } \right)^{\text{T}} } \right]}. $$

(2.100)

Substituting Eqs. (2.98) and (2.99) back into the Lagrangian , we get

$$ L\left( {\alpha } \right) = - \frac{1}{2}\sum\limits_{i,j} {\alpha_{i} \alpha_{j} K_{p} \left( {{\mathbf{t}}_{i} ,{\mathbf{t}}_{j} } \right)} + \sum\limits_{i} {\alpha_{i} }. $$

(2.101)

Thus, the dual problem of the triplet-SVM can be rewritten as follows:

$$ \begin{aligned} & \mathop {\hbox{max} }\limits_{\alpha } \, - \frac{1}{2}\sum\limits_{i,j} {\alpha_{i} \alpha_{j} K_{p} \left( {{\mathbf{t}}_{i} ,{\mathbf{t}}_{j} } \right)} + \sum\limits_{i} {\alpha_{i} } \\ & {\text{s}} . {\text{t}} .\quad 0 \le \alpha_{l} \le C,\quad \forall l. \\ \end{aligned} $$

(2.102)

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Zhang, D., Xu, Y., Zuo, W. (2016). Metric Learning with Biometric Applications. In: Discriminative Learning in Biometrics. Springer, Singapore. https://doi.org/10.1007/978-981-10-2056-8_2

Download citation

DOI: https://doi.org/10.1007/978-981-10-2056-8_2
Published: 27 October 2016
Publisher Name: Springer, Singapore
Print ISBN: 978-981-10-2055-1
Online ISBN: 978-981-10-2056-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics