Advertisement

The Group Loss for Deep Metric Learning

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 12352)

Abstract

Deep metric learning has yielded impressive results in tasks such as clustering and image retrieval by leveraging neural networks to obtain highly discriminative feature embeddings, which can be used to group samples into different classes. Much research has been devoted to the design of smart loss functions or data mining strategies for training such networks. Most methods consider only pairs or triplets of samples within a mini-batch to compute the loss function, which is commonly based on the distance between embeddings. We propose Group Loss, a loss function based on a differentiable label-propagation method that enforces embedding similarity across all samples of a group while promoting, at the same time, low-density regions amongst data points belonging to different groups. Guided by the smoothness assumption that “similar objects should belong to the same group”, the proposed loss trains the neural network for a classification task, enforcing a consistent labelling amongst samples within a class. We show state-of-the-art results on clustering and image retrieval on several datasets, and show the potential of our method when combined with other techniques such as ensembles. To facilitate further research, we make available the code and the models at https://github.com/dvl-tum/group_loss.

Keywords

Deep metric learning Image retrieval Image clustering 

Notes

Acknowledgements

This research was partially funded by the Humboldt Foundation through the Sofja Kovalevskaja Award. We thank Michele Fenzi, Maxim Maximov and Guillem Braso Andilla for useful discussions.

Supplementary material

504444_1_En_17_MOESM1_ESM.pdf (2 mb)
Supplementary material 1 (pdf 2063 KB)

References

  1. 1.
    Alemu, L.T., Shah, M., Pelillo, M.: Deep constrained dominant sets for person re-identification. In: IEEE/CVF International Conference on Computer Vision, ICCV, pp. 9854–9863 (2019)Google Scholar
  2. 2.
    Bergstra, J., Bengio, Y.: Random search for hyper-parameter optimization. J. Mach. Learn. Res. 13, 281–305 (2012)MathSciNetzbMATHGoogle Scholar
  3. 3.
    Bromley, J., Guyon, I., LeCun, Y., Säckinger, E., Shah, R.: Signature verification using a “siamese” time delay neural network. In: Advances in Neural Information Processing Systems, NIPS, pp. 737–744 (1994)Google Scholar
  4. 4.
    Çakir, F., He, K., Xia, X., Kulis, B., Sclaroff, S.: Deep metric learning to rank. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR, pp. 1861–1870 (2019)Google Scholar
  5. 5.
    Chopra, S., Hadsell, R., LeCun, Y.: Learning a similarity metric discriminatively, with application to face verification. In: IEEE Computer Vision and Pattern Recognition, CVPR, pp. 539–546 (2005)Google Scholar
  6. 6.
    Duan, Y., Chen, L., Lu, J., Zhou, J.: Deep embedding learning with discriminative sampling policy. In: IEEE Computer Vision and Pattern Recognition, CVPR (2019)Google Scholar
  7. 7.
    Elezi, I., Torcinovich, A., Vascon, S., Pelillo, M.: Transductive label augmentation for improved deep network learning. In: International Conference on Pattern Recognition, ICPR, pp. 1432–1437 (2018)Google Scholar
  8. 8.
    Erdem, A., Pelillo, M.: Graph transduction as a noncooperative game. Neural Comput. 24(3), 700–723 (2012)MathSciNetCrossRefGoogle Scholar
  9. 9.
    Ge, W., Huang, W., Dong, D., Scott, M.R.: Deep metric learning with hierarchical triplet loss. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11210, pp. 272–288. Springer, Cham (2018).  https://doi.org/10.1007/978-3-030-01231-1_17CrossRefGoogle Scholar
  10. 10.
    He, K., Çakir, F., Bargal, S.A., Sclaroff, S.: Hashing as tie-aware learning to rank. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR, pp. 4023–4032 (2018)Google Scholar
  11. 11.
    He, K., Lu, Y., Sclaroff, S.: Local descriptors optimized for average precision. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR, pp. 596–605 (2018)Google Scholar
  12. 12.
    Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, ICML, pp. 448–456 (2015)Google Scholar
  13. 13.
    Jégou, H., Douze, M., Schmid, C.: Product quantization for nearest neighbor search. IEEE Trans. Pattern Anal. Mach. Intell. 33(1), 117–128 (2011)CrossRefGoogle Scholar
  14. 14.
    Keskar, N.S., Mudigere, D., Nocedal, J., Smelyanskiy, M., Tang, P.T.P.: On large-batch training for deep learning: Generalization gap and sharp minima. In: International Conference on Learning Representations, ICLR (2017)Google Scholar
  15. 15.
    Kim, W., Goyal, B., Chawla, K., Lee, J., Kwon, K.: Attention-based ensemble for deep metric learning. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11205, pp. 760–777. Springer, Cham (2018).  https://doi.org/10.1007/978-3-030-01246-5_45CrossRefGoogle Scholar
  16. 16.
    Krause, J., Stark, M., Deng, J., Fei-Fei, L.: 3D object representations for fine-grained categorization. In: International IEEE Workshop on 3D Representation and Recognition (3dRR 2013), Sydney, Australia (2013)Google Scholar
  17. 17.
    Law, M.T., Urtasun, R., Zemel, R.S.: Deep spectral clustering learning. In: Proceedings of the 34th International Conference on Machine Learning, ICML, pp. 1985–1994 (2017)Google Scholar
  18. 18.
    Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., Han, J.: On the variance of the adaptive learning rate and beyond. In: International Conference on Learning Representations, ICLR (2020)Google Scholar
  19. 19.
    van der Maaten, L., Hinton, G.E.: Visualizing non-metric similarities in multiple maps. Mach. Learn. 87(1), 33–55 (2012)MathSciNetCrossRefGoogle Scholar
  20. 20.
    MacQueen, J.: Some methods for classification and analysis of multivariate observations. In: Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, vol. 1. pp. 281–297 (1967)Google Scholar
  21. 21.
    Manmatha, R., Wu, C., Smola, A.J., Krähenbühl, P.: Sampling matters in deep embedding learning. In: IEEE International Conference on Computer Vision, ICCV, pp. 2859–2867 (2017)Google Scholar
  22. 22.
    McDaid, A.F., Greene, D., Hurley, N.J.: Normalized mutual information to evaluate overlapping community finding algorithms. CoRR abs/1110.2515 (2011)Google Scholar
  23. 23.
    Movshovitz-Attias, Y., Toshev, A., Leung, T.K., Ioffe, S., Singh, S.: No fuss distance metric learning using proxies. In: IEEE International Conference on Computer Vision, ICCV, pp. 360–368 (2017)Google Scholar
  24. 24.
    Opitz, M., Waltner, G., Possegger, H., Bischof, H.: BIER - boosting independent embeddings robustly. In: IEEE International Conference on Computer Vision, ICCV, pp. 5199–5208 (2017)Google Scholar
  25. 25.
    Opitz, M., Waltner, G., Possegger, H., Bischof, H.: Deep metric learning with BIER: boosting independent embeddings robustly. IEEE Trans. Pattern Anal. Mach. Intell. 42(2), 276–290 (2020)CrossRefGoogle Scholar
  26. 26.
    Park, W., Kim, D., Lu, Y., Cho, M.: Relational knowledge distillation. In: IEEE Computer Vision and Pattern Recognition, CVPR (2019)Google Scholar
  27. 27.
    Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS Workshops (2017)Google Scholar
  28. 28.
    Pearson, K.: Notes on regression and inheritance in the case of two parents. Proc. R. Soc. London 58, 240–242 (1895)CrossRefGoogle Scholar
  29. 29.
    Pelillo, M.: The dynamics of nonlinear relaxation labeling processes. J. Math. Imaging Vis. 7(4), 309–323 (1997)MathSciNetCrossRefGoogle Scholar
  30. 30.
    Qian, Q., et al.: SoftTriple loss: Deep metric learning without triplet sampling. In: IEEE/CVF International Conference on Computer Vision, ICCV, pp. 6449–6457 (2019)Google Scholar
  31. 31.
    Revaud, J., Almazán, J., Rezende, R.S., de Souza, C.R.: Learning with average precision: Training image retrieval with a listwise loss. In: IEEE/CVF International Conference on Computer Vision, ICCV, pp. 5106–5115 (2019)Google Scholar
  32. 32.
    Rosenfeld, A., Hummel, R.A., Zucker, S.W.: Scene labeling by relaxation operations. IEEE Trans. Syst. Man Cybern. 6, 420–433 (1976)MathSciNetCrossRefGoogle Scholar
  33. 33.
    Russakovsky, O., et al.: ImageNet large scale visual recognition challenge. Int. J. Comput. Vis. 115(3), 211–252 (2015)MathSciNetCrossRefGoogle Scholar
  34. 34.
    Sanakoyeu, A., Tschernezki, V., Büchler, U., Ommer, B.: Divide and conquer the embedding space for metric learning. In: IEEE Computer Vision and Pattern Recognition, CVPR (2019)Google Scholar
  35. 35.
    Schroff, F., Kalenichenko, D., Philbin, J.: FaceNet: a unified embedding for face recognition and clustering. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR, pp. 815–823 (2015)Google Scholar
  36. 36.
    Schultz, M., Joachims, T.: Learning a distance metric from relative comparisons. In: Advances in Neural Information Processing Systems, NIPS, pp. 41–48 (2003)Google Scholar
  37. 37.
    Sohn, K.: Improved deep metric learning with multi-class n-pair loss objective. In: Advances in Neural Information Processing Systems, NIPS, pp. 1849–1857 (2016)Google Scholar
  38. 38.
    Song, H.O., Jegelka, S., Rathod, V., Murphy, K.: Deep metric learning via facility location. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR, pp. 2206–2214 (2017)Google Scholar
  39. 39.
    Song, H.O., Xiang, Y., Jegelka, S., Savarese, S.: Deep metric learning via lifted structured feature embedding. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR, pp. 4004–4012 (2016)Google Scholar
  40. 40.
    Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Going deeper with convolutions. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR. pp. 1–9 (2015)Google Scholar
  41. 41.
    Vo, N., Hays, J.: Generalization in metric learning: should the embedding layer be embedding layer? In: IEEE Winter Conference on Applications of Computer Vision, WACV, pp. 589–598 (2019)Google Scholar
  42. 42.
    Wah, C., Branson, S., Welinder, P., Perona, P., Belongie, S.: The Caltech-UCSD Birds-200-2011 Dataset. Technical report CNS-TR-2011-001, California Institute of Technology (2011)Google Scholar
  43. 43.
    Wang, J., Zhou, F., Wen, S., Liu, X., Lin, Y.: Deep metric learning with angular loss. In: IEEE International Conference on Computer Vision, ICCV, pp. 2612–2620 (2017)Google Scholar
  44. 44.
    Wang, X., Hua, Y., Kodirov, E., Hu, G., Garnier, R., Robertson, N.M.: Ranked list loss for deep metric learning. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR, pp. 5207–5216 (2019)Google Scholar
  45. 45.
    Wang, X., Han, X., Huang, W., Dong, D., Scott, M.R.: Multi-similarity loss with general pair weighting for deep metric learning. In: IEEE Computer Vision and Pattern Recognition, CVPR (2019)Google Scholar
  46. 46.
    Weibull, J.: Evolutionary Game Theory. MIT Press, Cambridge (1997)zbMATHGoogle Scholar
  47. 47.
    Weinberger, K.Q., Saul, L.K.: Distance metric learning for large margin nearest neighbor classification. J. Mach. Learn. Res. 10, 207–244 (2009)zbMATHGoogle Scholar
  48. 48.
    Xu, X., Yang, Y., Deng, C., Zheng, F.: Deep asymmetric metric learning via rich relationship mining. In: IEEE Computer Vision and Pattern Recognition, CVPRGoogle Scholar
  49. 49.
    Xuan, H., Souvenir, R., Pless, R.: Deep randomized ensembles for metric learning. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11220, pp. 751–762. Springer, Cham (2018).  https://doi.org/10.1007/978-3-030-01270-0_44CrossRefGoogle Scholar
  50. 50.
    Yu, B., Liu, T., Gong, M., Ding, C., Tao, D.: Correcting the triplet selection bias for triplet loss. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11210, pp. 71–86. Springer, Cham (2018).  https://doi.org/10.1007/978-3-030-01231-1_5CrossRefGoogle Scholar
  51. 51.
    Yuan, Y., Yang, K., Zhang, C.: Hard-aware deeply cascaded embedding. In: IEEE International Conference on Computer Vision, CVPR, pp. 814–823 (2017)Google Scholar
  52. 52.
    Zhai, A., Wu, H.: Classification is a strong baseline for deep metric learning. In: British Machine Vision Conference BMVC, p. 91 (2019)Google Scholar
  53. 53.
    Zhang, X., Zhou, F., Lin, Y., Zhang, S.: Embedding label structures for fine-grained feature representation. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR, pp. 1114–1123 (2016)Google Scholar
  54. 54.
    Zhao, K., Xu, J., Cheng, M.: RegularFace: deep face recognition via exclusive regularization. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR, pp. 1136–1144 (2019)Google Scholar
  55. 55.
    Zheng, X., Ji, R., Sun, X., Zhang, B., Wu, Y., Huang, F.: Towards optimal fine grained retrieval via decorrelated centralized loss with normalize-scale layer. In: Conference on Artificial Intelligence, AAAI, pp. 9291–9298 (2019)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  1. 1.Ca’ Foscari University of VeniceVeniceItaly
  2. 2.Technical University of MunichMunichGermany

Personalised recommendations