Skip to main content

Metric Learning with Biometric Applications

  • Chapter
  • First Online:
Book cover Discriminative Learning in Biometrics
  • 398 Accesses

Abstract

Learning a desired distance metric from given training samples plays a significant role in the field of machine learning. In this chapter, we first present two novel metric learning methods based on a support vector machine (SVM). We then present a kernel classification framework for metric learning that can be implemented efficiently by using the standard SVM solvers. Some novel kernel metric learning method s, such as the double-SVM and the triplet-SVM , are also introduced in this chapter.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    http://www.csie.ntu.edu.tw/cjlin/libsvm/

  2. 2.

    http://www.cs.berkeley.edu/~fowlkes/software/nca/

  3. 3.

    http://www.cs.utexas.edu/~pjain/itml/

  4. 4.

    http://homepage.tudelft.nl/19j49/Matlab_Toolbox_for_Dimensionality_Reduction.html

  5. 5.

    http://lear.inrialpes.fr/people/guillaumin/code.php

  6. 6.

    http://www.cse.wustl.edu/~kilian/code/code.html/

  7. 7.

    http://cui.unige.ch/~wangjun/

  8. 8.

    http://empslocal.ex.ac.uk/people/staff/yy267/software.html

  9. 9.

    http://www.csie.ntu.edu.tw/~cjlin/libsvm/

  10. 10.

    http://www.cse.wustl.edu/~kilian/code/code.html

  11. 11.

    http://www.cs.utexas.edu/~pjain/itml/

  12. 12.

    http://lear.inrialpes.fr/people/guillaumin/code.php

  13. 13.

    http://www.cs.berkeley.edu/~fowlkes/software/nca/

  14. 14.

    http://homepage.tudelft.nl/19j49/Matlab_Toolbox_for_Dimensionality_Reduction.html

References

  • S. Andrews, I. Tsochantaridis, T. Hofmann, Support vector machines for multiple-instance learning, in Proceedings of Advances in Neural Information Processing Systems (2002), pp. 561–568

    Google Scholar 

  • M.-F. Balcan, A. Blum, N. Srebro, A theory of learning with similarity functions. Mach. Learn. 72(1–2), 89–112 (2008)

    Article  Google Scholar 

  • M. Belkin, P. Niyogi, V. Sindhwani, Manifold regularization: a geometric framework for learning from labeled and unlabeled examples. J. Mach. Learn. Res. 7, 2399–2434 (2006)

    MathSciNet  MATH  Google Scholar 

  • A. Bellet, A. Habrard, M. Sebban, Good edit similarity learning by loss minimization. Mach. Learn. 89(1–2), 5–35 (2012)

    Article  MathSciNet  MATH  Google Scholar 

  • A. Bellet, A. Habrard, M. Sebban, A survey on metric learning for feature vectors and structured data. arXiv preprint arXiv:1306.6709 (2002)

  • A. Bordes, L. Bottou, P. Gallinari, J. Weston, Solving multiclass support vector machines with LaRank, in Proceedings of the 24th International Conference on Machine Learning (ACM, 2007), pp. 89–96

    Google Scholar 

  • C.-C. Chang, C.-J. Lin, LIBSVM: A library for support vector machines. ACM Trans. Intell. Syst. Technol. 2(3), 27 (2011)

    Article  Google Scholar 

  • R. Collobert, S. Bengio, Y. Bengio, A parallel mixture of SVMs for very large scale problems. Neural Comput. 14(5), 1105–1114 (2002)

    Article  MATH  Google Scholar 

  • I. Csisz, G. TUSN DY, Information geometry and alternating minimization procedures. Stat. Decis. 1, 205–237 (1984)

    MathSciNet  Google Scholar 

  • J.V. Davis, B. Kulis, P. Jain, S. Sra, I.S. Dhillon, Information-theoretic metric learning, in Proceedings of the 24th International Conference on Machine Learning (ACM, 2007), pp. 209–216

    Google Scholar 

  • J. Demšar, Statistical comparisons of classifiers over multiple data sets. J. Mach. Learn. Res. 7, 1–30 (2006)

    MathSciNet  MATH  Google Scholar 

  • T. Evgeniou, M. Pontil, Regularized multi-task learning, in Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (ACM, 2004), pp. 109–117

    Google Scholar 

  • A. Frank, A. Asuncion, UCI machine learning repository (2010). Available: http://archive.ics.uci.edu/ml

  • Y. Fu, S. Yan, T.S. Huang, Correlation metric for generalized feature extraction. IEEE Trans. Pattern Anal. Mach. Intell. 30(12), 2229–2235 (2008)

    Article  Google Scholar 

  • A. Globerson, S.T. Roweis, Metric learning by collapsing classes, in Proceedings of Advances in Neural Information Processing Systems (2005), pp. 451–458

    Google Scholar 

  • J. Goldberger, G.E. Hinton, S.T. Roweis, R. Salakhutdinov, Neighbourhood components analysis, in Proceedings of Advances in Neural Information Processing Systems (2004), pp. 513–520

    Google Scholar 

  • M. Guillaumin, J. Verbeek, C. Schmid, Is that you? Metric learning approaches for face identification, in Proceedings of IEEE International Conference on Computer Vision (IEEE, 2009), pp. 498–505

    Google Scholar 

  • A. Gunawardana, W. Byrne, Convergence theorems for generalized alternating minimization procedures. J. Mach. Learn. Res. 6, 2049–2073 (2005)

    MathSciNet  MATH  Google Scholar 

  • C. Huang, S. Zhu, K. Yu, Large scale strongly supervised ensemble metric learning, with applications to face verification and retrieval. arXiv preprin arXiv:1212.6094 (2012)

  • G.B. Huang, M. Ramesh, T. Berg, E. Learned-Miller, Labeled faces in the wild: a database for studying face recognition in unconstrained environments. Technical Report 07-49 (University of Massachusetts, Amherst, 2007)

    Google Scholar 

  • D. Kedem, S. Tyree, F. Sha, G.R. Lanckriet, K.Q. Weinberger, Non-linear metric learning, in Proceedings of Advances in Neural Information Processing Systems (2012), pp. 2573–2581

    Google Scholar 

  • S.S. Keerthi, K. Duan, S.K. Shevade, A.N. Poo, A fast dual algorithm for kernel logistic regression. Mach. Learn. 61(1–3), 151–165 (2005)

    Article  MATH  Google Scholar 

  • K. Koh, S.-J. Kim, S.P. Boyd, An interior-point method for large-scale l1-regularized logistic regression. J. Mach. Learn. Res. 8(8), 1519–1555 (2007)

    MathSciNet  MATH  Google Scholar 

  • X. Li, C. Shen, Q. Shi, A. Dick, A. Van den Hengel, Non-sparse linear representations for visual tracking with online reservoir metric learning, in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (IEEE, 2012), pp. 1760–1767

    Google Scholar 

  • K.-R. M Ller, S. MIKA, G. R TSCH, K. TSUDA, B. SCH LKOPF, An introduction to kernel-based learning algorithms. IEEE Trans. Neural Netw. 12(2), 181–201 (2001)

    Article  Google Scholar 

  • T. Mensink, J. Verbeek, F. Perronnin, G. Csurka, Metric learning for large scale image classification: generalizing to new classes at near-zero cost, in Proceedings of Computer Vision–ECCV (Springer, Berlin, 2012), pp. 488–501

    Google Scholar 

  • J. Platt, Fast training of support vector machines using sequential minimal optimization. Adv. Kernel Methods Support Vector Learn. 3, 185–208 (1999)

    Google Scholar 

  • B. Schlkopf, J.C. Platt, J. Shawe-Taylor, A.J. Smola, R.C. Williamson, Estimating the support of a high-dimensional distribution. Neural Comput. 13(7), 1443–1471 (2001)

    Article  MATH  Google Scholar 

  • S. Shalev-Shwartz, Y. Singer, N. Srebro, A. Cotter, Pegasos: primal estimated sub-gradient solver for SVM. Math. Program. 127(1), 3–30 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  • J. Shawe-Taylor, N. Cristianini, Kernel Methods for Pattern Analysis (Cambridge University Press, Cambridge, 2010)

    Google Scholar 

  • C. Shen, J. Kim, L. Wang, A scalable dual approach to semidefinite metric learning, in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (IEEE, 2011), pp. 2601–2608

    Google Scholar 

  • C. Shen, J. Kim, L. Wang, A. Hengel, Positive semidefinite metric learning with boosting, in Proceedings of Advances in Neural Information Processing Systems (2009), pp. 1651–1659

    Google Scholar 

  • C.H. Teo, A. Smola, S. Vishwanathan, Q.V. Le, A scalable modular convex solver for regularized risk minimization, in Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (ACM, 2007), pp. 727–736

    Google Scholar 

  • I.W. Tsang, A. Kocsor, J.T. Kwok, Simpler core vector machines with enclosing balls, in Proceedings of the 24th International Conference on Machine Learning (ACM, 2007), pp. 911–918

    Google Scholar 

  • I.W. Tsang, J.T. Kwok, P.-M. Cheung, Core vector machines: fast SVM training on very large data sets. J. Mach. Learn. Res. 363–392 (2005)

    Google Scholar 

  • V. Vapnik, The Nature of Statistical Learning Theory (Springer Science & Business Media, Berlin, 2013)

    Google Scholar 

  • F. Wang, W. Zuo, L. Zhang, D. Meng, D. Zhang, A kernel classification framework for metric learning (2013)

    Google Scholar 

  • J. Wang, H.T. Do, A. Woznica, A. Kalousis, Metric learning with multiple kernels, in Proceedings of Advances in Neural Information Processing Systems (2011), pp. 1170–1178

    Google Scholar 

  • J. Wang, A. Kalousis, A. Woznica, Parametric local metric learning for nearest neighbor classification, in Proceedings of Advances in Neural Information Processing Systems (2012), pp. 1601–1609

    Google Scholar 

  • K.Q. Weinberger, J. Blitzer, L.K. Saul, Distance metric learning for large margin nearest neighbor classification. J. Mach. Learn. Res. 10, 207–244 (2009)

    MATH  Google Scholar 

  • L. Yang, R. Jin, Distance Metric Learning: A comprehensive Survey, vol. 2 (Michigan State University, 2006)

    Google Scholar 

  • Y. Ying, P. Li, Distance metric learning with eigenvalue optimization. J. Mach. Learn. Res. 13(1), 1–26 (2012)

    MathSciNet  MATH  Google Scholar 

  • W. Zuo, F. Wang, D. Zhang, L. Lin, Y. Huang, D. Meng, L. Zhang, Iterated support vector machines for distance metric learning. arXiv preprint arXiv:1502.00363 (2015)

Download references

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to David Zhang , Yong Xu or Wangmeng Zuo .

Appendices

Appendix 1: The Dual of PCML

The original problem of PCML is formulated as

$$ \begin{aligned} & \mathop {\hbox{min} }\limits_{{{\mathbf{M}},b,{\xi }}} \quad \frac{1}{2}\left\| {\mathbf{M}} \right\|_{F}^{2} + C\sum\limits_{i,j} {\xi_{ij} } \hfill \\ & {\text{s}} . {\text{t}} .\quad h_{ij} \left( {\left\langle {{\mathbf{M}},{\mathbf{X}}_{ij} } \right\rangle + b} \right) \ge 1 - \xi_{ij} ,\xi_{ij} \ge 0,\;\forall i,j,{\mathbf{M}} \, \succcurlyeq \, 0. \hfill \\ \end{aligned} $$
(2.60)

Its Lagrangian is

$$ \begin{aligned} L\left( {{\lambda },{\kappa },{\mathbf{Y}},{\mathbf{M}},b,{\xi }} \right) & = \frac{1}{2}\left\| {\mathbf{M}} \right\|_{F}^{2} + C\sum\limits_{i,j} {\xi_{ij} } - \sum\limits_{i,j} {\lambda_{ij} \left[ {h_{ij} \left( {\left\langle {{\mathbf{M}},{\mathbf{X}}_{ij} } \right\rangle + b} \right) - 1 + \xi_{ij} } \right]} \\ & \quad - \sum\limits_{i,j} {\kappa_{ij} \xi_{ij} } - \left\langle {{\mathbf{Y}},{\mathbf{M}}} \right\rangle, \\ \end{aligned} $$
(2.61)

where λ, κ, and Y are the Lagrange multiplier s that satisfy λ ij  ≥ 0, κ ij  ≥ 0, ∀ij, and \( {\mathbf{Y}} \, \succcurlyeq \, 0 \). Based on the KKT conditions, the original problem can be converted into the dual problem . KKT conditions are defined as follows:

$$ \frac{{\partial L\left( {{\lambda },{\kappa },{\mathbf{Y}},{\mathbf{M}},b,{\xi }} \right)}}{{\partial {\mathbf{M}}}} = {0} \Rightarrow {\mathbf{M}} - \sum\limits_{i,j} {\lambda_{ij} h_{ij} {\mathbf{X}}_{ij} } - {\mathbf{Y}} = {0}, $$
(2.62)
$$ \frac{{\partial L\left( {{\lambda },{\kappa },{\mathbf{Y}},{\mathbf{M}},b,{\xi }} \right)}}{\partial b} = 0 \Rightarrow \sum\limits_{i,j} {\lambda_{ij} h_{ij} } = 0, $$
(2.63)
$$ \frac{{\partial L\left( {{\lambda },{\kappa },{\mathbf{Y}},{\mathbf{M}},b,{\xi }} \right)}}{{\partial \xi_{ij} }} = C - \lambda_{ij} - \kappa_{ij} = 0 \Rightarrow 0 \le \lambda_{ij} \le C,\;\forall i,j, $$
(2.64)
$$ h_{ij} \left( {\left\langle {{\mathbf{M}},{\mathbf{X}}_{ij} } \right\rangle + b} \right) - 1 + \xi_{ij} \ge 0,\quad \xi_{ij} \ge 0, $$
(2.65)
$$ \lambda_{ij} \ge 0,\quad \kappa_{ij} \ge 0,\quad {\mathbf{Y}} \, \succcurlyeq \, 0, $$
(2.66)
$$ \lambda_{ij} \left[ {h_{ij} \left( {\left\langle {{\mathbf{M}},{\mathbf{X}}_{ij} } \right\rangle + b} \right) - 1 + \xi_{ij} } \right] = 0,\;\kappa_{ij} \xi_{ij} = 0. $$
(2.67)

Equation (2.62) implies the relationship between \( {\lambda } \), Y and M as follows:

$$ {\mathbf{M}} = \sum\limits_{i,j} {\lambda_{ij} h_{ij} {\mathbf{X}}_{ij} } + {\mathbf{Y}}. $$
(2.68)

Substituting Eqs. (2.62)–(2.69) back into the Lagrangian , we get the following Lagrange dual problem of PCML:

$$ \begin{aligned} & \mathop {\hbox{max} }\limits_{{{\lambda },{\mathbf{Y}}}} \quad - \frac{1}{2}\left\| {\sum\limits_{i,j} {\lambda_{ij} h_{ij} {\mathbf{X}}_{ij} } + {\mathbf{Y}}} \right\|_{F}^{2} + \sum\limits_{i,j} {\lambda_{ij} } \\ & {\text{s}} . {\text{t}} .\quad \sum\limits_{i,j} {\lambda_{ij} h_{ij} } = 0,0 \le \lambda_{ij} \le C,{\mathbf{Y}} \, \succcurlyeq \, 0. \\ \end{aligned} $$
(2.69)

From Eqs. (2.68) and (2.69), we can see that matrix M is explicitly determined by the training procedure, and b is not. Nevertheless, b can be easily obtained by using the KKT complementarity condition in Eqs. (2.64) and (2.67), which shows that ξ ij  = 0 if λ ij  < C, and \( h_{ij} \left( {\left\langle {{\mathbf{M}},{\mathbf{X}}_{ij} } \right\rangle + b} \right) - 1 + \xi_{ij} = 0 \) if λ ij  > 0. Thus, we can simply take any training point for which 0 < λ ij  < C to calculate b by

$$ b = \frac{1}{{h_{ij} }} - \left\langle {{\mathbf{M}},{\mathbf{X}}_{ij} } \right\rangle ,\quad {\text{for}}\;{\text{all}}\;\;0 < \lambda_{ij} < C s. $$
(2.70)

Note that it is reasonable to take the average of all such training points. After we obtained b, we can calculate ξ ij by

$$ \xi_{ij} = \left\{ {\begin{array}{*{20}l} {0\quad {\text{for}}\;{\text{all}}\;\lambda_{ij} < C} \hfill \\ {\left[ {1 - h_{ij} \left( {\left\langle {{\mathbf{M}},{\mathbf{X}}_{ij} } \right\rangle + b} \right)} \right]_{ + } \quad {\text{for}}\;{\text{all}}\;\lambda_{ij} = C}, \hfill \\ \end{array} } \right. $$
(2.71)

where term [z]+= max (z, 0) denotes the standard hinge loss .

Appendix 2: The Dual of NCML

The primal problem of NCML is as follows:

$$ \begin{aligned} & \mathop {\hbox{min} }\limits_{{{\alpha },b,{\xi }}} \quad \frac{1}{2}\sum\limits_{i,j} {\sum\limits_{k,l} {\alpha_{ij} \alpha_{kl} \left\langle {{\mathbf{X}}_{ij} ,{\mathbf{X}}_{kl} } \right\rangle } } + C\sum\limits_{i,j} {\xi_{ij} } \\ & {\text{s}} . {\text{t}} .\quad h_{ij} \left( {\sum\limits_{k,l} {\alpha_{kl} \left\langle {{\mathbf{X}}_{ij} ,{\mathbf{X}}_{kl} } \right\rangle } + b} \right) \ge 1 - \xi_{ij} ,\quad \xi_{ij} \ge 0, \;\;\alpha_{ij} \ge 0,\;\;\forall i,j. \\ \end{aligned} $$
(2.72)

Its Lagrangian can be defined as

$$ \begin{aligned} L\left( {{\beta },{\sigma },{\nu },{\alpha },b,{\xi }} \right) & = \frac{1}{2}\sum\limits_{i,j} {\sum\limits_{k,l} {\alpha_{ij} \alpha_{kl} \left\langle {{\mathbf{X}}_{ij} ,{\mathbf{X}}_{kl} } \right\rangle } } + C\sum\limits_{i,j} {\xi_{ij} } - \sum\limits_{i,j} {\sigma_{ij} \alpha_{ij} } \\ & \quad - \sum\limits_{i,j} {\beta_{ij} \left[ {h_{ij} \left( {\sum\nolimits_{kl} {\alpha_{kl} \left\langle {{\mathbf{X}}_{ij} ,{\mathbf{X}}_{kl} } \right\rangle } + b} \right) - 1 + \xi_{ij} } \right]} - \sum\limits_{i,j} {\nu_{ij} \xi_{ij} }, \\ \end{aligned} $$
(2.73)

where β, σ, and ν are the Lagrange multiplier s that satisfy β ij  ≥ 0, σ ij  ≥ 0 and ν ij  ≥ 0, ∀ij. Converting the original problem to its dual problem needs the following KKT conditions:

$$ \frac{{\partial L\left( {{\beta },{\sigma },{\nu },{\alpha },b,{\xi }} \right)}}{{\partial \alpha_{ij} }} = 0 \Rightarrow \sum\limits_{k,l} {\alpha_{kl} \left\langle {{\mathbf{X}}_{ij} ,{\mathbf{X}}_{kl} } \right\rangle } - \sum\limits_{k,l} {\beta_{kl} h_{kl} \left\langle {{\mathbf{X}}_{ij} ,{\mathbf{X}}_{kl} } \right\rangle } - \sigma_{ij} = 0, $$
(2.74)
$$ \frac{{\partial L\left( {{\beta },{\sigma },{\nu },{\alpha },b,{\xi }} \right)}}{\partial b} = 0 \Rightarrow \sum\limits_{i,j} {\beta_{ij} h_{ij} } = 0, $$
(2.75)
$$ \frac{{\partial L\left( {{\beta },{\sigma },{\nu },{\alpha },b,{\xi }} \right)}}{{\partial \xi_{ij} }} = 0 \Rightarrow C - \beta_{ij} - \nu_{ij} = 0 \Rightarrow 0 \le \beta_{ij} \le C, $$
(2.76)
$$ h_{ij} \left( {\sum\limits_{k,l} {\alpha_{kl} \left\langle {{\mathbf{X}}_{ij} ,{\mathbf{X}}_{kl} } \right\rangle } + b} \right) - 1 + \xi_{ij} \ge 0,\quad \xi_{ij} \ge 0,\;\;\alpha_{ij} \ge 0,\;\;\forall i,j, $$
(2.77)
$$ \beta_{ij} \ge 0,\;\sigma_{ij} \ge 0,\;\nu_{ij} \ge 0,\quad \forall i,j, $$
(2.78)
$$ \beta_{ij} \left[ {h_{ij} \left( {\sum\limits_{kl} {\alpha_{kl} \left\langle {{\mathbf{X}}_{ij} ,{\mathbf{X}}_{kl} } \right\rangle } + b} \right) - 1 + \xi_{ij} } \right] = 0,\quad \nu_{ij} \xi_{ij} = 0,\;\;\sigma_{ij} \alpha_{ij} = 0,\;\; \forall i,j. $$
(2.79)

Here, we introduce a coefficient vector η, which satisfies \( \sigma_{ij} = \sum\limits_{k,l} {\eta_{kl} \left\langle {{\mathbf{X}}_{ij} ,{\mathbf{X}}_{kl} } \right\rangle } \), where \( \left\langle {{\mathbf{X}}_{ij} ,{\mathbf{X}}_{kl} } \right\rangle \) denotes a positive definite kernel . Thus, we can guarantee that every η has a unique corresponding \( \varvec{\sigma} \), and vice versa. According to Eq. (2.74), the relationship between α, β, and η is

$$ \alpha_{ij} = \beta_{ij} h_{ij} + \eta_{ij} ,\quad \forall i,j. $$
(2.80)

Substituting Eqs. (2.74)–(2.76) back into the Lagrangian , the Lagrange dual problem of NCML can be rewritten as follows:

$$ \begin{aligned} & \mathop {\hbox{max} }\limits_{{{\eta },{\beta }}} \quad - \frac{1}{2}\sum\limits_{i,j} {\sum\limits_{k,l} {\left( {\beta_{ij} h_{ij} + \eta_{ij} } \right)\left( {\beta_{kl} h_{kl} + \eta_{kl} } \right)\left\langle {{\mathbf{X}}_{ij} ,{\mathbf{X}}_{kl} } \right\rangle } } + \sum\limits_{i,j} {\beta_{ij} } \\ & {\text{s}} . {\text{t}} .\quad \sum\limits_{k,l} {\eta_{kl} \left\langle {{\mathbf{X}}_{ij} ,{\mathbf{X}}_{kl} } \right\rangle } \ge 0, \quad 0 \le \beta_{ij} \le C, \;\; \forall i,j, \quad \sum\limits_{i,j} {\beta_{ij} h_{ij} } = 0. \\ \end{aligned} $$
(2.81)

Analogous to PCML, we can use the KKT complementarity condition in Eq. (2.75) to compute b and ξ ij in NCML. Eqs. (2.76) and (2.79) show that ξ ij  = 0 if β ij < C, and \( h_{ij} \left( {\sum\nolimits_{kl} {\alpha_{kl} \left\langle {{\mathbf{X}}_{ij} ,{\mathbf{X}}_{kl} } \right\rangle } + b} \right) - 1 + \xi_{ij} = 0 \) if β ij > 0. With any training point for which 0 < β ij < C, b can be obtained by

$$ b = \frac{1}{{h_{ij} }} - \sum\limits_{kl} {\alpha_{kl} \left\langle {{\mathbf{X}}_{ij} ,{\mathbf{X}}_{kl} } \right\rangle }. $$
(2.82)

Therefore, β ij can also be obtained by

$$ \xi_{ij} = \left\{ {\begin{array}{*{20}l} {0\quad {\text{for}}\;{\text{all}}\;\beta_{ij} < C} \hfill \\ {\left[ {1 - h_{ij} \left( {\sum\limits_{k,l} {\alpha_{kl} \left\langle {{\mathbf{X}}_{ij} ,{\mathbf{X}}_{kl} } \right\rangle } + b} \right)} \right]_{ + } \quad {\text{for}}\;{\text{all}}\;\beta_{ij} = C}, \hfill \\ \end{array} } \right. $$
(2.83)

where term [z]+ = max (z, 0) denotes the standard hinge loss .

Appendix 3: The Dual of the Subproblem of η in NCML

The subproblem of η is defined as follows:

$$ \begin{aligned} & \mathop {\hbox{min} }\limits_{\eta } \quad \frac{1}{2}\sum\limits_{i,j} {\sum\limits_{k,l} {\eta_{ij} \eta_{kl} \left\langle {{\mathbf{X}}_{ij} ,{\mathbf{X}}_{kl} } \right\rangle } } + \sum\limits_{i,j} {\eta_{ij} \gamma_{ij} } \\ & {\text{s}} . {\text{t}} .\quad \sum\limits_{k,l} {\eta_{kl} \left\langle {{\mathbf{X}}_{ij} ,{\mathbf{X}}_{kl} } \right\rangle } \ge 0,\quad \forall i,j, \\ \end{aligned} $$
(2.84)

where \( \gamma_{ij} = \sum\limits_{k,l} {\beta_{kl} h_{kl} \left\langle {{\mathbf{X}}_{ij} ,{\mathbf{X}}_{kl} } \right\rangle } \). Its Lagrangian is

$$ L\left( {{\mu },{\eta }} \right) = \frac{1}{2}\sum\limits_{i,j} {\sum\limits_{k,l} {\eta_{ij} \eta_{kl} \left\langle {{\mathbf{X}}_{ij} ,{\mathbf{X}}_{kl} } \right\rangle } } + \sum\limits_{i,j} {\eta_{ij} \gamma_{ij} } - \sum\limits_{i,j} {\mu_{ij} \sum\limits_{k,l} {\eta_{kl} \left\langle {{\mathbf{X}}_{ij} ,{\mathbf{X}}_{kl} } \right\rangle } }, $$
(2.85)

where μ is the Lagrange multiplier that satisfies μ ij  ≥ 0, ∀ij. Converting the original problem to its dual problem needs the following KKT condition:

$$ \frac{{\partial L\left( {{\mu },{\eta }} \right)}}{{\partial \eta_{ij} }} = 0 \Rightarrow \sum\limits_{k,l} {\eta_{kl} \left\langle {{\mathbf{X}}_{ij} ,{\mathbf{X}}_{kl} } \right\rangle } + \gamma_{ij} - \sum\limits_{k,l} {\mu_{kl} \left\langle {{\mathbf{X}}_{ij} ,{\mathbf{X}}_{kl} } \right\rangle } = 0. $$
(2.86)

According to Eq. (2.86), the relationship between μ, η and β is

$$ \eta_{ij} = \mu_{ij} - h_{ij} \beta_{ij} ,\quad \forall i,j. $$
(2.87)

Substituting Eqs. (2.86) and (2.87) back into the Lagrangian , we get the following Lagrange dual problem of the subproblem of η

$$ \begin{aligned} & \mathop {\hbox{max} }\limits_{\mu } \quad - \frac{1}{2}\sum\limits_{i,j} {\sum\limits_{k,l} {\mu_{ij} \mu_{kl} \left\langle {{\mathbf{X}}_{ij} ,{\mathbf{X}}_{kl} } \right\rangle } } + \sum\limits_{i,j} {\gamma_{ij} \mu_{ij} } \\ & \quad - \frac{1}{2}\sum\limits_{i,j} {\sum\limits_{k,l} {\beta_{ij} \beta_{kl} h_{ij} h_{kl} \left\langle {{\mathbf{X}}_{ij} ,{\mathbf{X}}_{kl} } \right\rangle } } \\ & \quad {\text{s}} . {\text{t}} .\quad \mu_{ij} \ge 0,\forall i,j. \\ \end{aligned} $$
(2.88)

Since β is fixed in this subproblem, \( \sum\nolimits_{i,j} {\sum\nolimits_{k,l} {\beta_{ij} \beta_{kl} h_{ij} h_{kl} \left\langle {{\mathbf{X}}_{ij} ,{\mathbf{X}}_{kl} } \right\rangle } } \) remains constant in Eq. (2.88). Thus, we can omit this term and derive the simplified Lagrange dual problem as follows:

$$ \begin{aligned} & \mathop {\hbox{max} }\limits_{\mu } \quad - \frac{1}{2}\sum\limits_{i,j} {\sum\limits_{k,l} {\mu_{ij} \mu_{kl} \left\langle {{\mathbf{X}}_{ij} ,{\mathbf{X}}_{kl} } \right\rangle } } + \sum\limits_{i,j} {\gamma_{ij} \mu_{ij} } \\ & {\text{s}} . {\text{t}} .\quad \mu_{ij} \ge 0, \quad \forall i,j. \\ \end{aligned} $$
(2.89)

Appendix 4: The Dual of the Doublet-SVM

According to the original problem of the doublet-SVM defined in Eq. (2.56), its Lagrange function can be defined as follows

$$ \begin{aligned} L\left( {{\mathbf{M}},b,{\xi },{\alpha },{\beta }} \right) & = \frac{1}{2}\left\| {\mathbf{M}} \right\|_{F}^{2} + C\sum\limits_{l} {\xi_{l} } \\ & \quad - \sum\limits_{l} {\alpha_{l} \left[ {h_{l} \left( {({\mathbf{x}}_{l,1} - {\mathbf{x}}_{l,2} )^{\text{T}} {\mathbf{M}}({\mathbf{x}}_{l,1} - {\mathbf{x}}_{l,2} ) + b} \right) - 1 + \xi_{l} } \right]} - \sum\limits_{l} {\beta_{l} \xi_{l} }, \\ \end{aligned} $$
(2.90)

where α and β are the Lagrange multiplier s that satisfy α l  ≥ 0 and β l  ≥ 0, ∀ l. To convert the original problem to its dual needs the following KKT conditions:

$$ \frac{{\partial L\left( {{\mathbf{M}},b,{\xi },{\alpha },{\beta }} \right)}}{{\partial {\mathbf{M}}}} = {0} \Rightarrow {\mathbf{M}} - \sum\limits_{l} {\alpha_{l} h_{l} \left( {{\mathbf{x}}_{l,1} - {\mathbf{x}}_{l,2} } \right)\left( {{\mathbf{x}}_{l,1} - {\mathbf{x}}_{l,2} } \right)^{\text{T}} } = {0}, $$
(2.91)
$$ \frac{{\partial L\left( {{\mathbf{M}},b,{\xi },{\alpha },{\beta }} \right)}}{\partial b} = 0 \Rightarrow \sum\limits_{l} {\alpha_{l} h_{l} } = 0, $$
(2.92)
$$ \frac{{\partial L\left( {{\mathbf{M}},b,{\xi },{\alpha },{\beta }} \right)}}{{\partial \xi_{l} }} = 0 \Rightarrow C - \alpha_{l} - \beta_{l} = 0 \Rightarrow 0 < \alpha_{l} < C,\quad \forall l. $$
(2.93)

According to Eq. (2.91), the relationship between M and α is

$$ {\mathbf{M}} = \sum\limits_{l} {\alpha_{l} h_{l} \left( {{\mathbf{x}}_{l,1} - {\mathbf{x}}_{l,2} } \right)\left( {{\mathbf{x}}_{l,1} - {\mathbf{x}}_{l,2} } \right)^{\text{T}} }. $$
(2.94)

Substituting Eqs. (2.91)–(2.93) back into the Lagrangian function, we have

$$ L\left( {\alpha } \right) = - \frac{1}{2}\sum\limits_{i,j} {\alpha_{i} \alpha_{j} h_{i} h_{j} K_{p} \left( {{\mathbf{z}}_{i} ,{\mathbf{z}}_{j} } \right)} + \sum\limits_{i} {\alpha_{i} }. $$
(2.95)

Thus, the dual problem of the doublet-SVM can be formulated as follows:

$$ \begin{aligned} & \mathop {\hbox{max} }\limits_{\alpha } \, - \frac{1}{2}\sum\limits_{i,j} {\alpha_{i} \alpha_{j} h_{i} h_{j} K_{p} \left( {{\mathbf{z}}_{i} ,{\mathbf{z}}_{j} } \right)} + \sum\limits_{i} {\alpha_{i} } \\ & {\text{s}} . {\text{t}} .\quad 0 \le \alpha_{l} \le C, \quad \sum\limits_{l} {\alpha_{l} h_{l} } = 0,\quad \forall l. \\ \end{aligned} $$
(2.96)

Appendix 5: The Dual of the Triplet-SVM

According to the original problem of the triplet-SVM in Eq. (2.58), its Lagrange function can be defined as follows:

$$ \begin{aligned} L\left( {{\mathbf{M}},{\xi },{\alpha },{\beta }} \right) & = \frac{1}{2}\left\| {\mathbf{M}} \right\|_{F}^{2} + C\sum\limits_{l} {\xi_{l} } - \sum\limits_{l} {\alpha_{l} [} ({\mathbf{x}}_{l,1} - {\mathbf{x}}_{l,3} )^{\text{T}} {\mathbf{M}}({\mathbf{x}}_{l,1} - {\mathbf{x}}_{l,3} ) \\ & \quad - ({\mathbf{x}}_{l,1} - {\mathbf{x}}_{l,2} )^{\text{T}} {\mathbf{M}}({\mathbf{x}}_{l,1} - {\mathbf{x}}_{l,2} )] + \sum\limits_{l} {\alpha_{l} } - \sum\limits_{l} {\alpha_{l} \xi_{l} } - \sum\limits_{l} {\beta_{l} \xi_{l} }, \\ \end{aligned} $$
(2.97)

where α and β are the Lagrange multiplier s. To convert the original problem to its dual, we let the derivative of the Lagrangian function requires the following KKT conditions:

$$ \begin{aligned} & \frac{{\partial L\left( {{\mathbf{M}},b,{\xi },{\alpha },{\beta }} \right)}}{{\partial {\mathbf{M}}}} = {0} \Rightarrow \\ & {\mathbf{M}} - \sum\limits_{l} {\alpha_{l} \left[ {\left( {{\mathbf{x}}_{l,1} - {\mathbf{x}}_{l,3} } \right)\left( {{\mathbf{x}}_{l,1} - {\mathbf{x}}_{l,3} } \right)^{\text{T}} - \left( {{\mathbf{x}}_{l,1} - {\mathbf{x}}_{l,2} } \right)\left( {{\mathbf{x}}_{l,1} - {\mathbf{x}}_{l,2} } \right)^{\text{T}} } \right]} = {0}, \\ \end{aligned} $$
(2.98)
$$ \frac{{\partial L\left( {{\mathbf{M}},b,{\xi },{\alpha },{\beta }} \right)}}{{\partial \xi_{l} }} = 0 \Rightarrow C - \alpha_{l} - \beta_{l} = 0,\;\forall l. $$
(2.99)

According to Eq. (2.98), the relationship between M and \( {\alpha } \) is:

$$ {\mathbf{M}} = \sum\limits_{l} {\alpha_{l} \left[ {\left( {{\mathbf{x}}_{l,1} - {\mathbf{x}}_{l,3} } \right)\left( {{\mathbf{x}}_{l,1} - {\mathbf{x}}_{l,3} } \right)^{\text{T}} - \left( {{\mathbf{x}}_{l,1} - {\mathbf{x}}_{l,2} } \right)\left( {{\mathbf{x}}_{l,1} - {\mathbf{x}}_{l,2} } \right)^{\text{T}} } \right]}. $$
(2.100)

Substituting Eqs. (2.98) and (2.99) back into the Lagrangian , we get

$$ L\left( {\alpha } \right) = - \frac{1}{2}\sum\limits_{i,j} {\alpha_{i} \alpha_{j} K_{p} \left( {{\mathbf{t}}_{i} ,{\mathbf{t}}_{j} } \right)} + \sum\limits_{i} {\alpha_{i} }. $$
(2.101)

Thus, the dual problem of the triplet-SVM can be rewritten as follows:

$$ \begin{aligned} & \mathop {\hbox{max} }\limits_{\alpha } \, - \frac{1}{2}\sum\limits_{i,j} {\alpha_{i} \alpha_{j} K_{p} \left( {{\mathbf{t}}_{i} ,{\mathbf{t}}_{j} } \right)} + \sum\limits_{i} {\alpha_{i} } \\ & {\text{s}} . {\text{t}} .\quad 0 \le \alpha_{l} \le C,\quad \forall l. \\ \end{aligned} $$
(2.102)

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer Science+Business Media Singapore

About this chapter

Cite this chapter

Zhang, D., Xu, Y., Zuo, W. (2016). Metric Learning with Biometric Applications. In: Discriminative Learning in Biometrics. Springer, Singapore. https://doi.org/10.1007/978-981-10-2056-8_2

Download citation

  • DOI: https://doi.org/10.1007/978-981-10-2056-8_2

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-10-2055-1

  • Online ISBN: 978-981-10-2056-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics