Skip to main content

Hierarchical Average Precision Training for Pertinent Image Retrieval

  • Conference paper
  • First Online:
Computer Vision – ECCV 2022 (ECCV 2022)

Abstract

Image Retrieval is commonly evaluated with Average Precision (AP) or Recall@k. Yet, those metrics, are limited to binary labels and do not take into account errors’ severity. This paper introduces a new hierarchical AP training method for pertinent image retrieval (HAPPIER). HAPPIER is based on a new \(\mathcal {H}\text {-AP}\) metric, which leverages a concept hierarchy to refine AP by integrating errors’ importance and better evaluate rankings. To train deep models with \(\mathcal {H}\text {-AP}\), we carefully study the problem’s structure and design a smooth lower bound surrogate combined with a clustering loss that ensures consistent ordering. Extensive experiments on 6 datasets show that HAPPIER significantly outperforms state-of-the-art methods for hierarchical retrieval, while being on par with the latest approaches when evaluating fine-grained ranking performances. Finally, we show that HAPPIER leads to better organization of the embedding space, and prevents most severe failure cases of non-hierarchical methods. Our code is publicly available at https://github.com/elias-ramzi/HAPPIER.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 89.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 119.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    For the sake of readability, our notations are given for a single query. During training, HAPPIER optimizes our hierarchical retrieval objective by averaging several queries.

  2. 2.

    CSL’s score on Table 2 are above those reported in [32]; personal discussions with the authors [32] validate that our results are valid for CSL, see supplementary B.5.

References

  1. Bertinetto, L., Mueller, R., Tertikas, K., Samangooei, S., Lord, N.A.: Making better mistakes: leveraging class hierarchies with deep networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12506–12515 (2020)

    Google Scholar 

  2. Brown, A., Xie, W., Kalogeiton, V., Zisserman, A.: Smooth-AP: smoothing the path towards large-scale image retrieval. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12354, pp. 677–694. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58545-7_39

    Chapter  Google Scholar 

  3. Bruch, S., Zoghi, M., Bendersky, M., Najork, M.: Revisiting approximate metric optimization in the age of deep neural networks. In: Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 1241–1244 (2019)

    Google Scholar 

  4. Burges, C., et al.: Learning to rank using gradient descent. In: Proceedings of the 22nd International Conference on Machine Learning, pp. 89–96. ICML 2005, Association for Computing Machinery, New York, NY, USA (2005). https://doi.org/10.1145/1102351.1102363

  5. Burges, C., Ragno, R., Le, Q.: Learning to rank with nonsmooth cost functions. In: Schölkopf, B., Platt, J., Hoffman, T. (eds.) Advances in Neural Information Processing Systems, vol. 19. MIT Press (2006). https://proceedings.neurips.cc/paper/2006/file/af44c4c56f385c43f2529f9b1b018f6a-Paper.pdf

  6. Cakir, F., He, K., Xia, X., Kulis, B., Sclaroff, S.: Deep metric learning to rank. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1861–1870 (2019)

    Google Scholar 

  7. Chan, D.M., Rao, R., Huang, F., Canny, J.F.: GPU accelerated t-distributed stochastic neighbor embedding. J. Parallel Distrib. Comput. 131, 1–13 (2019)

    Article  Google Scholar 

  8. Chang, D., Pang, K., Zheng, Y., Ma, Z., Song, Y.Z., Guo, J.: Your “flamingo” is my “bird”: fine-grained, or not. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11476–11485 (2021)

    Google Scholar 

  9. Chapelle, O., Chang, Y.: Yahoo! learning to rank challenge overview. In: Proceedings of the learning to rank challenge, pp. 1–24. PMLR (2011)

    Google Scholar 

  10. Croft, W.B., Metzler, D., Strohman, T.: Search engines: information retrieval in practice, vol. 520. Addison-Wesley Reading (2010)

    Google Scholar 

  11. Deng, J., Guo, J., Xue, N., Zafeiriou, S.: ArcFace: additive angular margin loss for deep face recognition. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 4690–4699 (2019)

    Google Scholar 

  12. Dhall, A., Makarova, A., Ganea, O., Pavllo, D., Greeff, M., Krause, A.: Hierarchical image classification using entailment cone embeddings. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pp. 836–837 (2020)

    Google Scholar 

  13. Dupret, G., Piwowarski, B.: A user behavior model for average precision and its generalization to graded judgments. In: Proceedings of the 33rd International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 531–538. SIGIR 2010, Association for Computing Machinery, New York, NY, USA (2010). https://doi.org/10.1145/1835449.1835538

  14. Dupret, G., Piwowarski, B.: Model based comparison of discounted cumulative gain and average precision. J. Discrete Algorithms 18, 49–62 (2013). https://doi.org/10.1016/j.jda.2012.10.002. https://www.sciencedirect.com/science/article/pii/S1570866712001372 Selected papers from the 18th International Symposium on String Processing and Information Retrieval (SPIRE 2011)

  15. Hadsell, R., Chopra, S., LeCun, Y.: Dimensionality reduction by learning an invariant mapping. In: 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’06), vol. 2, pp. 1735–1742. IEEE (2006)

    Google Scholar 

  16. Hjørland, B.: The foundation of the concept of relevance. J. Am. Soc. Inform. Sci. Technol. 61(2), 217–237 (2010)

    Google Scholar 

  17. Järvelin, K., Kekäläinen, J.: Cumulated gain-based evaluation of IR techniques. ACM Trans. Inf. Syst. (TOIS) 20(4), 422–446 (2002)

    Article  Google Scholar 

  18. Järvelin, K., Kekäläinen, J.: IR evaluation methods for retrieving highly relevant documents. In: ACM SIGIR Forum, vol. 51, pp. 243–250. ACM New York, NY, USA (2017)

    Google Scholar 

  19. Kekäläinen, J., Järvelin, K.: Using graded relevance assessments in ir evaluation. J. Am. Soc. Inf. Sci. Technol. 53(13), 1120–1129 (2002). https://doi.org/10.1002/asi.10137. https://onlinelibrary.wiley.com/doi/abs/10.1002/asi.10137

  20. Law, M.T., Thome, N., Cord, M.: Learning a distance metric from relative comparisons between quadruplets of images. Int. J. Comput. Vision 121(1), 65–94 (2017)

    Article  MathSciNet  Google Scholar 

  21. Movshovitz-Attias, Y., Toshev, A., Leung, T.K., Ioffe, S., Singh, S.: No fuss distance metric learning using proxies. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 360–368 (2017)

    Google Scholar 

  22. Oh Song, H., Xiang, Y., Jegelka, S., Savarese, S.: Deep metric learning via lifted structured feature embedding. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4004–4012 (2016)

    Google Scholar 

  23. P., M.V., Paulus, A., Musil, V., Martius, G., Rolínek, M.: Differentiation of blackbox combinatorial solvers. In: ICLR (2020)

    Google Scholar 

  24. Qin, T., Liu, T.: Introducing LETOR 4.0 datasets. arXiv preprint arXiv:1306.2597 (2013)

  25. Qin, T., Liu, T.Y., Li, H.: A general approximation framework for direct optimization of information retrieval measures. Inf. Retrieval 13, 375–397 (2009)

    Article  Google Scholar 

  26. Radenović, F., Tolias, G., Chum, O.: CNN image retrieval learns from BoW: unsupervised fine-tuning with hard examples. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9905, pp. 3–20. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46448-0_1

    Chapter  Google Scholar 

  27. Ramzi, E., Thome, N., Rambour, C., Audebert, N., Bitot, X.: Robust and decomposable average precision for image retrieval. Advances in Neural Information Processing Systems 34 (2021)

    Google Scholar 

  28. Revaud, J., Almazán, J., Rezende, R.S., Souza, C.R.D.: Learning with average precision: Training image retrieval with a listwise loss. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 5107–5116 (2019)

    Google Scholar 

  29. Robertson, S.E., Kanoulas, E., Yilmaz, E.: Extending average precision to graded relevance judgments. In: Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval, pp. 603–610 (2010)

    Google Scholar 

  30. Rolínek, M., Musil, V., Paulus, A., Vlastelica, M., Michaelis, C., Martius, G.: Optimizing rank-based metrics with blackbox differentiation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7620–7630 (2020)

    Google Scholar 

  31. Sohn, K.: Improved deep metric learning with multi-class n-pair loss objective. In: Lee, D., Sugiyama, M., Luxburg, U., Guyon, I., Garnett, R. (eds.) Advances in Neural Information Processing Systems, vol. 29. Curran Associates, Inc. (2016). https://proceedings.neurips.cc/paper/2016/file/6b180037abbebea991d8b1232f8a8ca9-Paper.pdf

  32. Sun, Y., et al.: Dynamic metric learning: Towards a scalable metric space to accommodate multiple semantic scales. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5393–5402 (2021)

    Google Scholar 

  33. Taylor, M., Guiver, J., Robertson, S., Minka, T.: SoftRank: optimizing non-smooth rank metrics. In: Proceedings of the 2008 International Conference on Web Search and Data Mining, pp. 77–86. WSDM 2008, Association for Computing Machinery, New York, NY, USA (2008). https://doi.org/10.1145/1341531.1341544

  34. Teh, E.W., DeVries, T., Taylor, G.W.: ProxyNCA++: revisiting and revitalizing proxy neighborhood component analysis. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12369, pp. 448–464. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58586-0_27

    Chapter  Google Scholar 

  35. van der Maaten, L., Hinton, G.: Visualizing high-dimensional data using t-SNE. J. Mach. Learn. Res. 9, 2579–2605 (2008)

    MATH  Google Scholar 

  36. Van Horn, G., et al.: The inaturalist species classification and detection dataset. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 8769–8778 (2018)

    Google Scholar 

  37. Wang, H., et al.: CosFace: large margin cosine loss for deep face recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 5265–5274 (2018)

    Google Scholar 

  38. Wang, X., Han, X., Huang, W., Dong, D., Scott, M.R.: Multi-similarity loss with general pair weighting for deep metric learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5022–5030 (2019)

    Google Scholar 

  39. Wang, X., Zhang, H., Huang, W., Scott, M.R.: Cross-batch memory for embedding learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6388–6397 (2020)

    Google Scholar 

  40. Wu, C.Y., Manmatha, R., Smola, A.J., Krahenbuhl, P.: Sampling matters in deep embedding learning. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2840–2848 (2017)

    Google Scholar 

  41. Zhai, A., Wu, H.Y.: Classification is a strong baseline for deep metric learning. arXiv preprint arXiv:1811.12649 (2018)

Download references

Acknowledgement

This work was done under a grant from the the AHEAD ANR program (ANR-20-THIA-0002). It was granted access to the HPC resources of IDRIS under the allocation 2021-AD011012645 made by GENCI.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Elias Ramzi .

Editor information

Editors and Affiliations

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 5596 KB)

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Ramzi, E., Audebert, N., Thome, N., Rambour, C., Bitot, X. (2022). Hierarchical Average Precision Training for Pertinent Image Retrieval. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds) Computer Vision – ECCV 2022. ECCV 2022. Lecture Notes in Computer Science, vol 13674. Springer, Cham. https://doi.org/10.1007/978-3-031-19781-9_15

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-19781-9_15

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-19780-2

  • Online ISBN: 978-3-031-19781-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics