ProxyNCA++: Revisiting and Revitalizing Proxy Neighborhood Component Analysis

Teh, Eu Wern; DeVries, Terrance; Taylor, Graham W.

doi:10.1007/978-3-030-58586-0_27

Eu Wern Teh^12,13,
Terrance DeVries^12,13 &
Graham W. Taylor^12,13

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 12369))

Included in the following conference series:

European Conference on Computer Vision

3317 Accesses
52 Citations

Abstract

We consider the problem of distance metric learning (DML), where the task is to learn an effective similarity measure between images. We revisit ProxyNCA and incorporate several enhancements. We find that low temperature scaling is a performance-critical component and explain why it works. Besides, we also discover that Global Max Pooling works better in general when compared to Global Average Pooling. Additionally, our proposed fast moving proxies also addresses small gradient issue of proxies, and this component synergizes well with low temperature scaling and Global Max Pooling. Our enhanced model, called ProxyNCA++, achieves a 22.9% point average improvement of Recall@1 across four different zero-shot retrieval datasets compared to the original ProxyNCA algorithm. Furthermore, we achieve state-of-the-art results on the CUB200, Cars196, Sop, and InShop datasets, achieving Recall@1 scores of 72.2, 90.1, 81.4, and 90.9, respectively.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
For additional experiments on different crop sizes, please refer to the corresponding supplementary materials

References

Bell, S., Bala, K.: Learning visual similarity for product design with convolutional neural networks. ACM Trans. Graph. 34(4), 1–10 (2015)
Article Google Scholar
Bromley, J., Guyon, I., LeCun, Y., Säckinger, E., Shah, R.: Signature verification using a “siamese" time delay neural network. In: Proceedings of the 6th International Conference on Neural Information Processing Systems, NIPS 1993, pp. 737–744, San Francisco, CA, USA (1993)
Google Scholar
Chechik, G., Sharma, V., Shalit, U., Bengio, S.: Large scale online learning of image similarity through ranking. J. Mach. Learn. Res. 11, 1109–1135 (2010)
MathSciNet MATH Google Scholar
Chopra, S., Hadsell, R., LeCun, Y.: Learning a similarity metric discriminatively, with application to face verification. In: 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition CVPR 2005. vol. 1, pp. 539-546 IEEE (2005)
Google Scholar
Thibaut, D., Nicolas, T., Matthieu, C.: Weldon: weakly supervised learning of deep convolutional neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4743–4752 (2016)
Google Scholar
Weifeng, G.: Deep metric learning with hierarchical triplet loss. In: The European Conference on Computer Vision (ECCV) (2018)
Google Scholar
Jacob, G., Geoffrey, E.H., Sam, T.R., Ruslan, R.S.: Neighbourhood components analysis. In: Advances in Neural Information Processing Systems, pp. 513–520 (2005)
Google Scholar
Goodfellow, I., Yoshua, B., Aaron, C.: Deep Learning. MIT Press (2016) http://www.deeplearningbook.org
Raia, H., Sumit, C., Yann, L.: Dimensionality reduction by learning an invariant mapping. In: 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2006), vol. 2, pp. 1735–1742. IEEE (2006)
Google Scholar
Kaiming, H., Xiangyu, Z., Shaoqing, R., Jian, S.: Deep residual learning for image recognition. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2016)
Google Scholar
Hershey, J. R., Chen, Z., Le Roux, J., Watanabe, S.: Deep clustering: Discriminative embeddings for segmentation and separation. In: 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 31–35, (2016)
Google Scholar
Geoffrey, H., Oriol, V., Jeff, D.: Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531, 2015
Pierre, J., David, P., Histace, A., Edouard, K.: Metric learning with horde: High-order regularizer for deep embeddings. arXiv preprint arXiv:1908.02735 (2019)
Gregory, K.: Siamese neural networks for one-shot image recognition. In: ICML Deep Learning Workshop (2015)
Google Scholar
Jonathan, K., Michael, S., Jia, D., Li, F-F.: 3d object representations for fine-grained categorization. In: 4th International IEEE Workshop on 3D Representation and Recognition (3dRR-13), Sydney, Australia (2013)
Google Scholar
Ziwei, L., Ping, L., Shi, Q., Xiaogang, W., Xiaoou, T.: Deepfashion: Powering robust clothes recognition and retrieval with rich annotations. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1096–1104, (2016)
Google Scholar
Yair, M.-A., Alexander, T., Thomas, K. L., Sergey, I., Saurabh, S.: No fuss distance metric learning using proxies. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 360–368 (2017)
Google Scholar
Michael, O., Georg, W., Horst, P., Horst, B.: Bier - boosting independent embeddings robustly. In: The IEEE International Conference on Computer Vision (ICCV) (2017)
Google Scholar
Oren, R., Manohar, P., Piotr, D., Lubomir, B.: Metric learning with adaptive density discrimination. arXiv preprint arXiv:1511.05939 (2015)
Russakovsky, O., et al.: ImageNet large scale visual recognition challenge. Int. J. Comput. Vis. 115(3), 211–252 (2015). https://doi.org/10.1007/s11263-015-0816-y
Article MathSciNet Google Scholar
Artsiom, S., Vadim, T., Uta, B., Bjorn, O.: Divide and conquer the embedding space for metric learning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 471–480, (2019)
Google Scholar
Florian, S., Dmitry, K., James, P.: Facenet: a unified embedding for face recognition and clustering. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2015)
Google Scholar
Hyun, O.S., Yu, X., Stefanie, J., Silvio, S.: Deep metric learning via lifted structured feature embedding. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2016)
Google Scholar
Chopra, S., Hadsell, R., LeCun, Y.: Learning a similarity metric discriminatively, with application to face verification. In: 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2005). vol. 1, pp. 539-546. IEEE (2005)
Google Scholar
Szegedy, C., et al.: Going deeper with convolutions. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1–9 (2015)
Google Scholar
Evgeniya, U., Victor, L.: Learning deep embeddings with histogram loss. In: Lee, D.D., Sugiyama, M., Luxburg, U.V., Guyon, I., Garnett, R., (eds.) Advances in Neural Information Processing Systems 29, pp. 4170–4178. Curran Associates Inc (2016)
Google Scholar
Ashish, V.: Attention is all you need. In: Advances in Neural Information Processing Systems, pp. 5998–6008 (2017)
Google Scholar
Vinyals, O., Blundell, C., Lillicrap, T., Wierstra, D.: Matching networks for one shot learning. In: Proceedings of the 30th International Conference on Neural Information Processing Systems, NIPS 2016, pp. 3637–3645, USA, Curran Associates Inc (2016)
Google Scholar
Catherine, W., Steve, B., Peter, W., Pietro, P., Serge, B.: The caltech-ucsd birds-200-2011 dataset (2011)
Google Scholar
Wang, G., Yuan, Y., Chen, X., Li, J., Zhou, X.: Learning discriminative features with multiple granularities for person re-identification. In: Proceedings of the 26th ACM International Conference on Multimedia, MM 2018, pp. 274–282, New York, USA, ACM (2018)
Google Scholar
Jian, W., Feng, Z., Shilei, W., Xiao, L., Yuanqing, L.: Deep metric learning with angular loss. In: The IEEE International Conference on Computer Vision (ICCV) (2017)
Google Scholar
Xun, W., Xintong, H., Weilin, H., Dengke, D., Matthew, R.S.: Multi-similarity loss with general pair weighting for deep metric learning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5022–5030 (2019)
Google Scholar
Chao-Yuan, W., Manmatha, R., Alexander, J.S., Philipp, K.: Sampling matters in deep embedding learning. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2840–2848 (2017)
Google Scholar
Zhirong, W., Alexei, A.E., Stella, X.Y.: Improving generalization via scalable neighborhood component analysis. In Proceedings of the European Conference on Computer Vision (ECCV), pp. 685–701 (2018)
Google Scholar
Hong, X., Richard, S., Robert, P.: Deep randomized ensembles for metric learning. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 723–734 (2018)
Google Scholar
Zhai, A., Wu, H.Y.: Classification is a strong baseline for deep metric learning (2019)
Google Scholar
Feng, Z., et al.: Pyramidal person re-identification via multi-loss dynamic training. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2019)
Google Scholar

Download references

Author information

Authors and Affiliations

University of Guelph, Guelph, ON, Canada
Eu Wern Teh, Terrance DeVries & Graham W. Taylor
Vector Institute, Toronto, ON, Canada
Eu Wern Teh, Terrance DeVries & Graham W. Taylor

Authors

Eu Wern Teh
View author publications
You can also search for this author in PubMed Google Scholar
Terrance DeVries
View author publications
You can also search for this author in PubMed Google Scholar
Graham W. Taylor
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Eu Wern Teh .

Editor information

Editors and Affiliations

University of Oxford, Oxford, UK
Andrea Vedaldi
Graz University of Technology, Graz, Austria
Horst Bischof
University of Freiburg, Freiburg im Breisgau, Germany
Thomas Brox
University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
Jan-Michael Frahm

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 242 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Teh, E.W., DeVries, T., Taylor, G.W. (2020). ProxyNCA++: Revisiting and Revitalizing Proxy Neighborhood Component Analysis. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, JM. (eds) Computer Vision – ECCV 2020. ECCV 2020. Lecture Notes in Computer Science(), vol 12369. Springer, Cham. https://doi.org/10.1007/978-3-030-58586-0_27

Download citation

DOI: https://doi.org/10.1007/978-3-030-58586-0_27
Published: 30 November 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-58585-3
Online ISBN: 978-3-030-58586-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics