A New Loss for Image Retrieval: Class Anchor Margin

Ghiţă, Alexandru; Ionescu, Radu Tudor

doi:10.1007/978-981-97-2253-2_4

Alexandru Ghiţă¹³ &
Radu Tudor Ionescu¹³

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 14646))

Included in the following conference series:

Pacific-Asia Conference on Knowledge Discovery and Data Mining

203 Accesses

Abstract

The performance of neural networks in content-based image retrieval (CBIR) is highly influenced by the chosen loss (objective) function. The majority of objective functions for neural models can be divided into metric learning and statistical learning. Metric learning approaches require a pair mining strategy that often lacks efficiency, while statistical learning approaches are not generating highly compact features due to their indirect feature optimization. To this end, we propose a novel repeller-attractor loss that falls in the metric learning paradigm, yet directly optimizes for the \(L_{2}\) metric without the need of generating pairs. Our loss is formed of three components. One leading objective ensures that the learned features are attracted to each designated learnable class anchor. The second loss component regulates the anchors and forces them to be separable by a margin, while the third objective ensures that the anchors do not collapse to zero. Furthermore, we develop a more efficient two-stage retrieval system by harnessing the learned class anchors during the first stage of the retrieval process, eliminating the need of comparing the query with every image in the database. We establish a set of three datasets (CIFAR-100, Food-101, and ImageNet-200) and evaluate the proposed objective on the CBIR task, by using both convolutional and transformer architectures. Compared to existing objective functions, our empirical evidence shows that the proposed objective is generating superior and more consistent results.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 119.00; Price excludes VAT (USA)

Softcover Book: USD 79.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Barz, B., Denzler, J.: Deep learning on small datasets without pre-training using cosine loss. In: Proceedings of WACV, pp. 1360–1369. IEEE (2020)
Google Scholar
Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8694, pp. 446–461. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10599-4_29
Chapter Google Scholar
Cao, B., Araujo, A., Sim, J.: Unifying deep local and global features for image search. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12365, pp. 726–743. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58565-5_43
Chapter Google Scholar
Chen, W., Chen, X., Zhang, J., Huang, K.: Beyond triplet loss: a deep quadruplet network for person re-identification. In: Proceedings of CVPR, pp. 1320–1329. IEEE (2017)
Google Scholar
Cortes, C., Vapnik, V.: Support-vector networks. Mach. Learn. 20(3), 273–297 (1995)
Article Google Scholar
Deng, J., Guo, J., Xue, N., Zafeiriou, S.: ArcFace: additive angular margin loss for deep face recognition. In: Proceedings of CVPR, pp. 4685–4694. IEEE (2019)
Google Scholar
Dubey, S.R.: A decade survey of content based image retrieval using deep learning. IEEE Trans. Circuits Syst. Video Technol. 32(5), 2687–2704 (2021)
Article Google Scholar
Elezi, I., et al.: the group loss++: a deeper look into group loss for deep metric learning. IEEE Trans. Pattern Anal. Mach. Intell. 45(2), 2505–2518 (2022)
Article Google Scholar
Gajić, B., Amato, A., Baldrich, R., van de Weijer, J., Gatta, C.: Area under the ROC curve maximization for metric learning. In: Proceedings of CVPR, pp. 2807–2816. IEEE (2022)
Google Scholar
Georgescu, M.I., Duţǎ, G.E., Ionescu, R.T.: Teacher-student training and triplet loss to reduce the effect of drastic face occlusion: application to emotion recognition, gender identification and age estimation. Mach. Vis. Appl. 33(1), 12 (2022)
Article Google Scholar
Georgescu, M.I., Ionescu, R.T.: Teacher-student training and triplet loss for facial expression recognition under occlusion. In: Proceedings of ICPR, pp. 2288–2295. IEEE (2021)
Google Scholar
Hadsell, R., Chopra, S., LeCun, Y.: Dimensionality reduction by learning an invariant mapping. In: Proceedings of CVPR, vol. 2, pp. 1735–1742. IEEE (2006)
Google Scholar
Harwood, B., Kumar, V.B., Carneiro, G., Reid, I., Drummond, T.: Smart mining for deep metric learning. In: Proceedings of ICCV, pp. 2821–2829. IEEE (2017)
Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of CVPR, pp. 770–778. IEEE (2016)
Google Scholar
Khosla, P., et al.: Supervised contrastive learning. In: Proceedings of NeurIPS, vol. 33, pp. 18661–18673. Curran Associates, Inc. (2020)
Google Scholar
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. In: Proceedings of ICLR (2015)
Google Scholar
Krizhevsky, A.: Learning multiple layers of features from tiny images. Technical report, University of Toronto (2009)
Google Scholar
Lee, S., Seong, H., Lee, S., Kim, E.: Correlation verification for image retrieval. In: Proceedings of CVPR, pp. 5374–5384. IEEE (2022)
Google Scholar
Liu, W., Wen, Y., Yu, Z., Li, M., Raj, B., Song, L.: SphereFace: deep hypersphere embedding for face recognition. In: Proceedings of CVPR, pp. 6738–6746. IEEE, Los Alamitos, CA, USA (2017)
Google Scholar
Liu, W., Wen, Y., Yu, Z., Yang, M.: Large-margin softmax loss for convolutional neural networks. In: Proceedings of ICML, pp. 507–516. JMLR.org (2016)
Google Scholar
Liu, Z., et al.: Swin Transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of ICCV, pp. 10012–10022. IEEE (2021)
Google Scholar
Min, W., Mei, S., Li, Z., Jiang, S.: A two-stage triplet network training framework for image retrieval. IEEE Trans. Multimedia 22(12), 3128–3138 (2020)
Article Google Scholar
Muller, S.G., Hutter, F.: TrivialAugment: tuning-free yet state-of-the-art data augmentation. In: Proceedings of ICCV, pp. 754–762. IEEE, Los Alamitos, CA, USA (2021)
Google Scholar
Murphy, K.P.: Machine Learning: A Probabilistic Perspective. The MIT Press, Cambridge (2012)
Google Scholar
Paszke, A., et al.: PyTorch: an imperative style, high-performance deep learning library. In: Proceedings of NeurIPS, pp. 8024–8035. Curran Associates, Inc. (2019)
Google Scholar
Patel, Y., Tolias, G., Matas, J.: Recall@k surrogate loss with large batches and similarity mixup. In: Proceedings of CVPR, pp. 7502–7511. IEEE (2022)
Google Scholar
Philbin, J., Chum, O., Isard, M., Sivic, J., Zisserman, A.: Object retrieval with large vocabularies and fast spatial matching. In: Proceedings of CVPR, pp. 1–8. IEEE (2007)
Google Scholar
Polley, S., Mondal, S., Mannam, V.S., Kumar, K., Patra, S., Nürnberger, A.: X-vision: explainable image retrieval by re-ranking in semantic space. In: Proceedings of CIKM, pp. 4955–4959. Association for Computing Machinery, New York, NY, USA (2022)
Google Scholar
Radenović, F., Tolias, G., Chum, O.: Fine-tuning CNN image retrieval with no human annotation. IEEE Trans. Pattern Anal. Mach. Intell. 41(7), 1655–1668 (2019)
Article Google Scholar
Revaud, J., Almazán, J., Rezende, R.S., Souza, C.R.d.: Learning with average precision: training image retrieval with a listwise loss. In: Proceedings of ICCV, pp. 5107–5116. IEEE (2019)
Google Scholar
Russakovsky, O., et al.: ImageNet large scale visual recognition challenge. Int. J. Comput. Vision 115, 211–252 (2015)
Article MathSciNet Google Scholar
Schroff, F., Kalenichenko, D., Philbin, J.: FaceNet: a unified embedding for face recognition and clustering. In: Proceedings of CVPR, pp. 815–823. IEEE (2015)
Google Scholar
Sohn, K.: Improved deep metric learning with multi-class N-pair loss objective. In: Proceedings of NIPS, vol. 29. Curran Associates, Inc. (2016)
Google Scholar
Suh, Y., Han, B., Kim, W., Lee, K.M.: Stochastic class-based hard example mining for deep metric learning. In: Proceedings of CVPR, pp. 7244–7252. IEEE (2019)
Google Scholar
Tang, Y., Bai, W., Li, G., Liu, X., Zhang, Y.: CROLoss: towards a customizable loss for retrieval models in recommender systems. In: Proceedings of CIKM, pp. 1916–1924. Association for Computing Machinery, New York, NY, USA (2022)
Google Scholar
Balntas, V., Riba, E., Ponsa, D., Mikolajczyk, K.: Learning local feature descriptors with triplets and shallow convolutional neural networks. In: Proceedings of BMVC, pp. 119.1–119.11. BMVA Press (2016)
Google Scholar
Wang, F., Cheng, J., Liu, W., Liu, H.: Additive margin softmax for face verification. IEEE Sig. Process. Lett. 25(7), 926–930 (2018)
Article Google Scholar
Wang, H., et al.: CosFace: large margin cosine loss for deep face recognition. In: Proceedings of CVPR, pp. 5265–5274. IEEE (2018)
Google Scholar
Wen, Y., Zhang, K., Li, Z., Qiao, Yu.: A discriminative feature learning approach for deep face recognition. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9911, pp. 499–515. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46478-7_31
Chapter Google Scholar
Wu, C.Y., Manmatha, R., Smola, A.J., Krähenbühl, P.: Sampling matters in deep embedding learning. In: Proceedings of ICCV, pp. 2859–2867. IEEE (2017)
Google Scholar
Wu, H., Wang, M., Zhou, W., Li, H.: Learning deep local features with multiple dynamic attentions for large-scale image retrieval. In: Proceedings of ICCV, pp. 11416–11425. IEEE (2021)
Google Scholar
Yadan, O.: Hydra - a framework for elegantly configuring complex applications. Github (2019). https://github.com/facebookresearch/hydra
Yu, B., Tao, D.: Deep metric learning with tuplet margin loss. In: Proceedings of ICCV, pp. 6489–6498. IEEE (2019)
Google Scholar
Zhu, Q., Zhang, P., Wang, Z., Ye, X.: A new loss function for CNN classifier based on predefined evenly-distributed class centroids. IEEE Access 8, 10888–10895 (2019)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, University of Bucharest, Bucharest, Romania
Alexandru Ghiţă & Radu Tudor Ionescu

Authors

Alexandru Ghiţă
View author publications
You can also search for this author in PubMed Google Scholar
Radu Tudor Ionescu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Radu Tudor Ionescu .

Editor information

Editors and Affiliations

Taipei, Taiwan
De-Nian Yang
Microsoft Research Asia, Beijing, China
Xing Xie
National Yang Ming Chiao Tung University, Hsinchu, Taiwan
Vincent S. Tseng
Duke University, Durham, NC, USA
Jian Pei
National Cheng Kung University, Tainan, Taiwan
Jen-Wei Huang
Silesian University of Technology, Gliwice, Poland
Jerry Chun-Wei Lin

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ghiţă, A., Ionescu, R.T. (2024). A New Loss for Image Retrieval: Class Anchor Margin. In: Yang, DN., Xie, X., Tseng, V.S., Pei, J., Huang, JW., Lin, J.CW. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2024. Lecture Notes in Computer Science(), vol 14646. Springer, Singapore. https://doi.org/10.1007/978-981-97-2253-2_4

Download citation

DOI: https://doi.org/10.1007/978-981-97-2253-2_4
Published: 25 April 2024
Publisher Name: Springer, Singapore
Print ISBN: 978-981-97-2252-5
Online ISBN: 978-981-97-2253-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

A New Loss for Image Retrieval: Class Anchor Margin