DiVA: Diverse Visual Feature Aggregation for Deep Metric Learning

Milbich, Timo; Roth, Karsten; Bharadhwaj, Homanga; Sinha, Samarth; Bengio, Yoshua; Ommer, Björn; Cohen, Joseph Paul

doi:10.1007/978-3-030-58598-3_35

Timo Milbich¹²,
Karsten Roth^12,13,
Homanga Bharadhwaj^14,15,
Samarth Sinha^13,15,
Yoshua Bengio^13,16,
Björn Ommer¹² &
…
Joseph Paul Cohen¹³

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 12353))

Included in the following conference series:

European Conference on Computer Vision

3606 Accesses
23 Citations

Abstract

Visual similarity plays an important role in many computer vision applications. Deep metric learning (DML) is a powerful framework for learning such similarities which not only generalize from training data to identically distributed test distributions, but in particular also translate to unknown test classes. However, its prevailing learning paradigm is class-discriminative supervised training, which typically results in representations specialized in separating training classes. For effective generalization, however, such an image representation needs to capture a diverse range of data characteristics. To this end, we propose and study multiple complementary learning tasks, targeting conceptually different data relationships by only resorting to the available training samples and labels of a standard DML setting. Through simultaneous optimization of our tasks we learn a single model to aggregate their training signals, resulting in strong generalization and state-of-the-art performance on multiple established DML benchmark datasets.

T. Milbich, K. Roth, B. Ommer and J. P. Cohen—Equal first and last authorship.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
To compute d, we use the euclidean distance between samples. Since \(\phi \) is regularized to the unit hypersphere \(\mathbb {S}^{D-1}\), the euclidean distance correlates with cosine distance.

References

Bachman, P., Hjelm, R.D., Buchwalter, W.: Learning representations by maximizing mutual information across views. arXiv preprint arXiv:1906.00910 (2019)
Bautista, M.A., Sanakoyeu, A., Tikhoncheva, E., Ommer, B.: Cliquecnn: deep unsupervised exemplar learning. In: Advances in Neural Information Processing Systems, pp. 3846–3854 (2016)
Google Scholar
Belghazi, M.I., Rajeswar, S., Mastropietro, O., Rostamzadeh, N., Mitrovic, J., Courville, A.: Hierarchical adversarially learned inference (2018)
Google Scholar
Berthelot, D., Raffel, C., Roy, A., Goodfellow, I.: Understanding and improving interpolation in autoencoders via an adversarial regularizer (2018)
Google Scholar
Bhattarai, B., Sharma, G., Jurie, F.: Cp-mtml: coupled projection multi-task metric learning for large scale face retrieval. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4226–4235 (2016)
Google Scholar
Büchler, U., Brattoli, B., Ommer, B.: Improving spatiotemporal self-supervision by deep reinforcement learning. In: Proceedings of the European Conference on Computer Vision (ECCV) (2018)
Google Scholar
Caron, M., Bojanowski, P., Joulin, A., Douze, M.: Deep clustering for unsupervised learning of visual features. In: European Conference on Computer Vision (2018)
Google Scholar
Chawla, N.V., Japkowicz, N., Kotcz, A.: Editorial: special issue on learning from imbalanced data sets. SIGKDD Explor. Newsl. 6, 1–6 (2004)
Google Scholar
Chen, T., Kornblith, S., Norouzi, M., Hinton, G.: A simple framework for contrastive learning of visual representations. arXiv preprint arXiv:2002.05709 (2020)
Chen, W., Chen, X., Zhang, J., Huang, K.: Beyond triplet loss: a deep quadruplet network for person re-identification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2017)
Google Scholar
Chen, X., Duan, Y., Houthooft, R., Schulman, J., Sutskever, I., Abbeel, P.: Infogan: Interpretable representation learning by information maximizing generative adversarial nets (2016)
Google Scholar
Deng, J., Guo, J., Xue, N., Zafeiriou, S.: Arcface: additive angular margin loss for deep face recognition (2018)
Google Scholar
Donahue, J., Krähenbühl, P., Darrell, T.: Adversarial feature learning (2016)
Google Scholar
Duan, Y., Zheng, W., Lin, X., Lu, J., Zhou, J.: Deep adversarial metric learning. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2018
Google Scholar
Dumoulin, V., Belghazi, I., Poole, B., Mastropietro, O., Lamb, A., Arjovsky, M., Courville, A.: Adversarially learned inference (2016)
Google Scholar
Gautheron, L., Morvant, E., Habrard, A., Sebban, M.: Metric learning from imbalanced data (2019)
Google Scholar
Ge, W.: Deep metric learning with hierarchical triplet loss. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 269–285 (2018)
Google Scholar
Gidaris, S., Singh, P., Komodakis, N.: Unsupervised representation learning by predicting image rotations. In: International Conference on Learning Representations (2018)
Google Scholar
Gutmann, M., Hyvärinen, A.: Noise-contrastive estimation: a new estimation principle for unnormalized statistical models. In: Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics (2010)
Google Scholar
Hadsell, R., Chopra, S., LeCun, Y.: Dimensionality reduction by learning an invariant mapping. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2006)
Google Scholar
He, K., Fan, H., Wu, Y., Xie, S., Girshick, R.: Momentum contrast for unsupervised visual representation learning (2019)
Google Scholar
Hendrycks, D., Mazeika, M., Kadavath, S., Song, D.: Using self-supervised learning can improve model robustness and uncertainty. CoRR abs/1906.12340 (2019). http://arxiv.org/abs/1906.12340
Hsu, K., Levine, S., Finn, C.: Unsupervised learning via meta-learning (2018)
Google Scholar
Hu, J., Lu, J., Tan, Y.: Discriminative deep metric learning for face verification in the wild. In: 2014 IEEE Conference on Computer Vision and Pattern Recognition (2014)
Google Scholar
Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning (2015)
Google Scholar
Jacob, P., Picard, D., Histace, A., Klein, E.: Metric learning with horde: high-order regularizer for deep embeddings. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2019)
Google Scholar
Jegou, H., Douze, M., Schmid, C.: Product quantization for nearest neighbor search. IEEE Trans. Pattern Anal. Mach. Intell. 33(1), 117–128 (2011)
Google Scholar
Kim, W., Goyal, B., Chawla, K., Lee, J., Kwon, K.: Attention-based ensemble for deep metric learning. In: Proceedings of the European Conference on Computer Vision (ECCV) (2018)
Google Scholar
Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization (2015)
Google Scholar
Kingma, D.P., Welling, M.: Auto-encoding variational bayes. In: Proceedings of the International Conference on Learning Representations (ICLR) (2013)
Google Scholar
Konno, T., Iwazume, M.: Cavity filling: pseudo-feature generation for multi-class imbalanced data problems in deep learning (2018)
Google Scholar
Krause, J., Stark, M., Deng, J., Fei-Fei, L.: 3D object representations for fine-grained categorization. In: Proceedings of the IEEE International Conference on Computer Vision Workshops, pp. 554–561 (2013)
Google Scholar
Lin, X., Duan, Y., Dong, Q., Lu, J., Zhou, J.: Deep variational metric learning. In: The European Conference on Computer Vision (ECCV), September 2018
Google Scholar
Liu, W., Wen, Y., Yu, Z., Li, M., Raj, B., Song, L.: Sphereface: deep hypersphere embedding for face recognition. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017)
Google Scholar
Manning, C., Raghavan, P., Schütze, H.: Introduction to information retrieval. Nat. Lang. Eng. 16(1), 100–103 (2010)
MATH Google Scholar
Milbich, T., Ghori, O., Diego, F., Ommer, B.: Unsupervised representation learning by discovering reliable image relations. Pattern Recogn. (PR) 102, 107107 (2020)
Google Scholar
Milbich, T., Roth, K., Brattoli, B., Ommer, B.: Sharing matters for generalization in deep metric learning. IEEE Trans. Pattern Anal. Mach. Intell. (2020)
Google Scholar
Misra, I., van der Maaten, L.: Self-supervised learning of pretext-invariant representations (2019)
Google Scholar
Movshovitz-Attias, Y., Toshev, A., Leung, T.K., Ioffe, S., Singh, S.: No fuss distance metric learning using proxies. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 360–368 (2017)
Google Scholar
Noroozi, M., Favaro, P.: Unsupervised learning of visual representations by solving jigsaw puzzles. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9910, pp. 69–84. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46466-4_5
Chapter Google Scholar
Noroozi, M., Vinjimoor, A., Favaro, P., Pirsiavash, H.: Boosting self-supervised learning via knowledge transfer. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 9359–9367 (2018)
Google Scholar
Oh Song, H., Xiang, Y., Jegelka, S., Savarese, S.: Deep metric learning via lifted structured feature embedding. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4004–4012 (2016)
Google Scholar
Oord, A., Li, Y., Vinyals, O.: Representation learning with contrastive predictive coding (2018), arXiv preprint arXiv:1807.03748
Opitz, M., Waltner, G., Possegger, H., Bischof, H.: Deep metric learning with bier: boosting independent embeddings robustly. IEEE Trans. Pattern Anal. Mach. Intell. 42, 276–290 (2018)
Google Scholar
Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS-W (2017)
Google Scholar
Pu, J., Jiang, Y.G., Wang, J., Xue, X.: Which looks like which: exploring inter-class relationships in fine-grained visual categorization. In: Proceedings of the IEEE European Conference on Computer Vision (ECCV), pp. 425–440 (2014)
Google Scholar
Qian, Q., Shang, L., Sun, B., Hu, J., Li, H., Jin, R.: Softtriple loss: deep metric learning without triplet sampling (2019)
Google Scholar
Ren, S., He, K., Girshick, R.B., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: NeuRips (2015)
Google Scholar
Roth, K., Brattoli, B., Ommer, B.: Mic: mining interclass characteristics for improved metric learning. In: The IEEE International Conference on Computer Vision (ICCV), October 2019
Google Scholar
Roth, K., Milbich, T., Ommer, B.: Pads: policy-adapted sampling for visual similarity learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2020)
Google Scholar
Roth, K., Milbich, T., Sinha, S., Gupta, P., Ommer, B., Cohen, J.P.: Revisiting training strategies and generalization performance in deep metric learning. In: Proceedings of the International Conference on Machine Learning (ICML) (2020)
Google Scholar
Sanakoyeu, A., Tschernezki, V., Buchler, U., Ommer, B.: Divide and conquer the embedding space for metric learning. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2019)
Google Scholar
Schroff, F., Kalenichenko, D., Philbin, J.: Facenet: a unified embedding for face recognition and clustering. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 815–823 (2015)
Google Scholar
Sohn, K.: Improved deep metric learning with multi-class n-pair loss objective. In: Advances in Neural Information Processing Systems, pp. 1857–1865 (2016)
Google Scholar
Tian, Y., Krishnan, D., Isola, P.: Contrastive multiview coding. arXiv preprint arXiv:1906.05849 (2019)
Tishby, N., Zaslavsky, N.: Deep learning and the information bottleneck principle (2015)
Google Scholar
Verma, V., et al.: Manifold mixup: better representations by interpolating hidden states (2018)
Google Scholar
Vincent, P., Larochelle, H., Bengio, Y., Manzagol, P.A.: Extracting and composing robust features with denoising autoencoders (2008)
Google Scholar
Wagemans, J., et al.: A century of gestalt psychology in visual perception: I. perceptual grouping and figure-ground organization. Psychol. Bull. 138(6), 1172 (2012)
Google Scholar
Wah, C., Branson, S., Welinder, P., Perona, P., Belongie, S.: The caltech-ucsd birds-200-2011 dataset (2011)
Google Scholar
Wang, J., Zhou, F., Wen, S., Liu, X., Lin, Y.: Deep metric learning with angular loss. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2593–2601 (2017)
Google Scholar
Wang, X., Hua, Y., Kodirov, E., Hu, G., Garnier, R., Robertson, N.M.: Ranked list loss for deep metric learning. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2019)
Google Scholar
Wang, X., Han, X., Huang, W., Dong, D., Scott, M.R.: Multi-similarity loss with general pair weighting for deep metric learning. CoRR abs/1904.06627 (2019), http://arxiv.org/abs/1904.06627
Wu, C.Y., Manmatha, R., Smola, A.J., Krahenbuhl, P.: Sampling matters in deep embedding learning. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2840–2848 (2017)
Google Scholar
Wu, Z., Xiong, Y., Yu, S., Lin, D.: Unsupervised feature learning via non-parametric instance-level discrimination (2018)
Google Scholar
Xuan, H., Souvenir, R., Pless, R.: Deep randomized ensembles for metric learning. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 723–734 (2018)
Google Scholar
Zhai, A., Wu, H.Y.: Classification is a strong baseline for deep metric learning (2018)
Google Scholar
Zhao, Y., Jin, Z., Qi, G.J., Lu, H., Hua, X.S.: An adversarial approach to hard triplet generation. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 501–517 (2018)
Google Scholar
Zheng, W., Chen, Z., Lu, J., Zhou, J.: Hardness-aware deep metric learning. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2019)
Google Scholar

Download references

Acknowledgements

This work has been supported by hardware donations from NVIDIA (DGX-1), resources from Compute Canada, in part by Bayer AG and the German federal ministry BMWi within the project “KI Absicherung”.

Author information

Authors and Affiliations

Heidelberg Collaboratory for Image Processing (HCI), Heidelberg University, Heidelberg, Germany
Timo Milbich, Karsten Roth & Björn Ommer
Mila, Universite de Montreal, Montreal, Canada
Karsten Roth, Samarth Sinha, Yoshua Bengio & Joseph Paul Cohen
Vector Institute, Toronto Robotics Institute, Toronto, Canada
Homanga Bharadhwaj
University of Toronto, Toronto, Canada
Homanga Bharadhwaj & Samarth Sinha
CIFAR, Toronto, Canada
Yoshua Bengio

Authors

Timo Milbich
View author publications
You can also search for this author in PubMed Google Scholar
Karsten Roth
View author publications
You can also search for this author in PubMed Google Scholar
Homanga Bharadhwaj
View author publications
You can also search for this author in PubMed Google Scholar
Samarth Sinha
View author publications
You can also search for this author in PubMed Google Scholar
Yoshua Bengio
View author publications
You can also search for this author in PubMed Google Scholar
Björn Ommer
View author publications
You can also search for this author in PubMed Google Scholar
Joseph Paul Cohen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Timo Milbich .

Editor information

Editors and Affiliations

University of Oxford, Oxford, UK
Andrea Vedaldi
Graz University of Technology, Graz, Austria
Horst Bischof
University of Freiburg, Freiburg im Breisgau, Germany
Thomas Brox
University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
Jan-Michael Frahm

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Milbich, T. et al. (2020). DiVA: Diverse Visual Feature Aggregation for Deep Metric Learning. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, JM. (eds) Computer Vision – ECCV 2020. ECCV 2020. Lecture Notes in Computer Science(), vol 12353. Springer, Cham. https://doi.org/10.1007/978-3-030-58598-3_35

Download citation

DOI: https://doi.org/10.1007/978-3-030-58598-3_35
Published: 07 November 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-58597-6
Online ISBN: 978-3-030-58598-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics