Advertisement

Deep Metric Learning with Hierarchical Triplet Loss

  • Weifeng Ge
  • Weilin HuangEmail author
  • Dengke Dong
  • Matthew R. Scott
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11210)

Abstract

We present a novel hierarchical triplet loss (HTL) capable of automatically collecting informative training samples (triplets) via a defined hierarchical tree that encodes global context information. This allows us to cope with the main limitation of random sampling in training a conventional triplet loss, which is a central issue for deep metric learning. Our main contributions are two-fold. (i) we construct a hierarchical class-level tree where neighboring classes are merged recursively. The hierarchical structure naturally captures the intrinsic data distribution over the whole dataset. (ii) we formulate the problem of triplet collection by introducing a new violate margin, which is computed dynamically based on the designed hierarchical tree. This allows it to automatically select meaningful hard samples with the guide of global context. It encourages the model to learn more discriminative features from visual similar classes, leading to faster convergence and better performance. Our method is evaluated on the tasks of image retrieval and face recognition, where it outperforms the standard triplet loss substantially by 1%–18%, and achieves new state-of-the-art performance on a number of benchmarks.

Keywords

Deep metric learning Image retrieval Triplet loss Anchor-neighbor sampling 

References

  1. 1.
    Amos, B., Ludwiczuk, B., Satyanarayanan, M.: Openface: a general-purpose face recognition library with mobile applications. CMU School of Computer Science (2016)Google Scholar
  2. 2.
    Bai, S., Bai, X., Tian, Q., Latecki, L.J.: Regularized diffusion process for visual retrieval. In: AAAI, pp. 3967–3973 (2017)Google Scholar
  3. 3.
    Bai, S., Zhou, Z., Wang, J., Bai, X., Latecki, L.J., Tian, Q.: Ensemble diffusion for retrieval. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 774–783 (2017)Google Scholar
  4. 4.
    Bucher, M., Herbin, S., Jurie, F.: Hard negative mining for metric learning based zero-shot classification. In: Hua, G., Jégou, H. (eds.) ECCV 2016. LNCS, vol. 9915, pp. 524–531. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-49409-8_45CrossRefGoogle Scholar
  5. 5.
    Chen, W., Chen, X., Zhang, J., Huang, K.: Beyond triplet loss: a deep quadruplet network for person re-identification. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 2017Google Scholar
  6. 6.
    Hadsell, R., Chopra, S., LeCun, Y.: Dimensionality reduction by learning an invariant mapping. In: 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 2, pp. 1735–1742. IEEE (2006)Google Scholar
  7. 7.
    Harwood, B., Kumar, B.G.V., Carneiro, G., Reid, I., Drummond, T.: Smart mining for deep metric learning. In: The IEEE International Conference on Computer Vision (ICCV), October 2017Google Scholar
  8. 8.
    Huang, G.B., Ramesh, M., Berg, T., Learned-Miller, E.: Labeled faces in the wild: a database for studying face recognition in unconstrained environments. Technical report 07–49, University of Massachusetts, Amherst, October 2007Google Scholar
  9. 9.
    Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015)Google Scholar
  10. 10.
    Jia, Y., et al.: Caffe: convolutional architecture for fast feature embedding. In: Proceedings of the 22nd ACM International Conference on Multimedia, pp. 675–678. ACM (2014)Google Scholar
  11. 11.
    Krause, J., Stark, M., Deng, J., Fei-Fei, L.: 3D object representations for fine-grained categorization. In: 4th IEEE Workshop on 3D Representation and Recognition, ICCV (2013)Google Scholar
  12. 12.
    Kumar, B., Carneiro, G., Reid, I., et al.: Learning local image descriptors with deep siamese and triplet convolutional networks by minimising global loss functions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5385–5394 (2016)Google Scholar
  13. 13.
    van Lint, J.H., Wilson, R.M.: A Course in Combinatorics. Cambridge University Press, Cambridge (2001)Google Scholar
  14. 14.
    Liu, W., Wen, Y., Yu, Z., Li, M., Raj, B., Song, L.: Sphereface: deep hypersphere embedding for face recognition. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), vol. 1 (2017)Google Scholar
  15. 15.
    Liu, Z., Luo, P., Qiu, S., Wang, X., Tang, X.: Deepfashion: powering robust clothes recognition and retrieval with rich annotations. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1096–1104 (2016)Google Scholar
  16. 16.
    van der Maaten, L., Hinton, G.: Visualizing data using t-SNE. J. Mach. Learn. Res. 9(Nov), 2579–2605 (2008)Google Scholar
  17. 17.
    Opitz, M., Waltner, G., Possegger, H., Bischof, H.: Bier - boosting independent embeddings robustly. In: The IEEE International Conference on Computer Vision (ICCV), October 2017Google Scholar
  18. 18.
    Opitz, M., Waltner, G., Possegger, H., Bischof, H.: Bier-boosting independent embeddings robustly. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5189–5198 (2017)Google Scholar
  19. 19.
    Orr, G.B., Müller, K.R.: Neural Networks: Tricks of the Trade. Springer, Heidelberg (2003).  https://doi.org/10.1007/978-3-642-35289-8Google Scholar
  20. 20.
    Parkhi, O.M., Vedaldi, A., Zisserman, A., et al.: Deep face recognition. In: BMVC, vol. 1, p. 6 (2015)Google Scholar
  21. 21.
    Russakovsky, O., et al.: ImageNet large scale visual recognition challenge. Int. J. Comput. Vis. (IJCV) 115(3), 211–252 (2015).  https://doi.org/10.1007/s11263-015-0816-yMathSciNetCrossRefGoogle Scholar
  22. 22.
    Schroff, F., Kalenichenko, D., Philbin, J.: Facenet: a unified embedding for face recognition and clustering. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 815–823 (2015)Google Scholar
  23. 23.
    Shi, H., et al.: Embedding deep metric for person re-identification: a study against large variations. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9905, pp. 732–748. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-46448-0_44CrossRefGoogle Scholar
  24. 24.
    Simo-Serra, E., Trulls, E., Ferraz, L., Kokkinos, I., Fua, P., Moreno-Noguer, F.: Discriminative learning of deep convolutional feature point descriptors. In: 2015 IEEE International Conference on Computer Vision (ICCV), pp. 118–126. IEEE (2015)Google Scholar
  25. 25.
    Sohn, K.: Improved deep metric learning with multi-class n-pair loss objective. In: Advances in Neural Information Processing Systems, pp. 1857–1865 (2016)Google Scholar
  26. 26.
    Song, H.O., Xiang, Y., Jegelka, S., Savarese, S.: Deep metric learning via lifted structured feature embedding. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4004–4012. IEEE (2016)Google Scholar
  27. 27.
    Sun, Y., Chen, Y., Wang, X., Tang, X.: Deep learning face representation by joint identification-verification. In: Advances in Neural Information Processing Systems, pp. 1988–1996 (2014)Google Scholar
  28. 28.
    Szegedy, C., et al.: Going deeper with convolutions. In: CVPR (2015)Google Scholar
  29. 29.
    Tao, R., Gavves, E., Smeulders, A.W.: Siamese instance search for tracking. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1420–1429. IEEE (2016)Google Scholar
  30. 30.
    Ustinova, E., Lempitsky, V.: Learning deep embeddings with histogram loss. In: Advances in Neural Information Processing Systems, pp. 4170–4178 (2016)Google Scholar
  31. 31.
    Wah, C., Branson, S., Welinder, P., Perona, P., Belongie, S.: The Caltech-UCSD birds-200-2011 dataset (2011)Google Scholar
  32. 32.
    Waltner, G., Opitz, M., Bischof, H.: BaCoN: building a classifier from only n samples. In: Proceedings of CVWW, vol. 1 (2016)Google Scholar
  33. 33.
    Wen, Y., Zhang, K., Li, Z., Qiao, Y.: A discriminative feature learning approach for deep face recognition. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9911, pp. 499–515. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-46478-7_31CrossRefGoogle Scholar
  34. 34.
    Wohlhart, P., Lepetit, V.: Learning descriptors for object recognition and 3D pose estimation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3109–3118 (2015)Google Scholar
  35. 35.
    Wu, C.Y., Manmatha, R., Smola, A.J., Krahenbuhl, P.: Sampling matters in deep embedding learning. In: The IEEE International Conference on Computer Vision (ICCV), October 2017Google Scholar
  36. 36.
    Wu, C.Y., Manmatha, R., Smola, A.J., Krähenbühl, P.: Sampling matters in deep embedding learning. arXiv preprint arXiv:1706.07567 (2017)
  37. 37.
    Yi, D., Lei, Z., Liao, S., Li, S.Z.: Learning face representation from scratch. arXiv preprint arXiv:1411.7923 (2014)
  38. 38.
    Yuan, Y., Yang, K., Zhang, C.: Hard-aware deeply cascaded embedding. In: The IEEE International Conference on Computer Vision (ICCV), October 2017Google Scholar

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  • Weifeng Ge
    • 1
    • 2
    • 3
  • Weilin Huang
    • 1
    • 2
    Email author
  • Dengke Dong
    • 1
    • 2
  • Matthew R. Scott
    • 1
    • 2
  1. 1.Malong TechnologiesShenzhenChina
  2. 2.Shenzhen Malong Artificial Intelligence Research CenterShenzhenChina
  3. 3.The University of Hong KongPok Fu LamHong Kong

Personalised recommendations