Advertisement

Smooth-AP: Smoothing the Path Towards Large-Scale Image Retrieval

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 12354)

Abstract

Optimising a ranking-based metric, such as Average Precision (AP), is notoriously challenging due to the fact that it is non-differentiable, and hence cannot be optimised directly using gradient-descent methods. To this end, we introduce an objective that optimises instead a smoothed approximation of AP, coined Smooth-AP. Smooth-AP is a plug-and-play objective function that allows for end-to-end training of deep networks with a simple and elegant implementation. We also present an analysis for why directly optimising the ranking based metric of AP offers benefits over other deep metric learning losses.

We apply Smooth-AP to standard retrieval benchmarks: Stanford Online products and VehicleID, and also evaluate on larger-scale datasets: INaturalist for fine-grained category retrieval, and VGGFace2 and IJB-C for face retrieval. In all cases, we improve the performance over the state-of-the-art, especially for larger-scale datasets, thus demonstrating the effectiveness and scalability of Smooth-AP to real-world scenarios.

Notes

Acknowledgements

We are grateful to Tengda Han, Olivia Wiles, Christian Rupprecht, Sagar Vaze, Quentin Pleple and Maya Gulieva for proof-reading, and to Ernesto Coto for the initial motivation for this work. Funding for this research is provided by the EPSRC Programme Grant Seebibyte EP/M013774/1. AB is funded by an EPSRC DTA Studentship.

Supplementary material

504446_1_En_39_MOESM1_ESM.pdf (13 mb)
Supplementary material 1 (pdf 13360 KB)

References

  1. 1.
    Arandjelović, R., Gronat, P., Torii, A., Pajdla, T., Sivic, J.: NetVLAD: CNN architecture for weakly supervised place recognition. In: Proceedings of CVPR (2016)Google Scholar
  2. 2.
    Arandjelovic, R., Zisserman, A.: All about VLAD. In: Proceedings of CVPR (2013)Google Scholar
  3. 3.
    Babenko, A., Slesarev, A., Chigorin, A., Lempitsky, V.: Neural codes for image retrieval. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8689, pp. 584–599. Springer, Cham (2014).  https://doi.org/10.1007/978-3-319-10590-1_38CrossRefGoogle Scholar
  4. 4.
    Burges, C.J., Ragno, R., Le, Q.V.: Learning to rank with nonsmooth cost functions. In: NeurIPS (2007)Google Scholar
  5. 5.
    Cakir, F., He, K., Xia, X., Kulis, B., Sclaroff, S.: Deep metric learning to rank. In: Proceedings of CVPR (2019)Google Scholar
  6. 6.
    Cakir, F., He, K., Xia, X., Kulis, B., Sclaroff, S.: Fastap: deep metric learning to rank (2019). https://github.com/kunhe/Deep-Metric-Learning-Baselines
  7. 7.
    Cao, Q., Shen, L., Xie, W., Parkhi, O.M., Zisserman, A.: VGGFace2: a dataset for recognising faces across pose and age. In: Proceedings of International Conference on Automatic Face and Gesture Recognition (2018)Google Scholar
  8. 8.
    Cao, Z., Qin, T., Liu, T.Y., Tsai, M.F., Li, H.: Learning to rank: from pairwise approach to listwise approach. In: Proceedings of ICML (2007)Google Scholar
  9. 9.
    Chapelle, O., Le, Q., Smola, A.: Large margin optimization of ranking measures. In: NeurIPS (2007)Google Scholar
  10. 10.
    Chen, H., Xie, W., Vedaldi, A., Zisserman, A.: Autocorrect: deep inductive alignment of noisy geometric annotations. In: Proceedings of BMVC (2019)Google Scholar
  11. 11.
    Chopra, S., Hadsell, R., LeCun, Y.: Learning a similarity metric discriminatively, with application to face verification. In: Proceedings of CVPR (2005)Google Scholar
  12. 12.
    Chum, O., Mikulik, A., Perďoch, M., Matas, J.: Total recall II: query expansion revisited. In: Proceedings of CVPR (2011)Google Scholar
  13. 13.
    Deng, J., Guo, J., Xue, N., Zafeiriou, S.: Arcface: additive angular margin loss for deep face recognition. In: Proceedings of CVPR (2019)Google Scholar
  14. 14.
    Duan, Y., Zheng, W., Lin, X., Lu, J., Zhou, J.: Deep adversarial metric learning. In: Proceedings of CVPR (2018)Google Scholar
  15. 15.
    Engilberge, M., Chevallier, L., Pérez, P., Cord, M.: Sodeep: a sorting deep net to learn ranking loss surrogates. In: Proceedings of CVPR (2019)Google Scholar
  16. 16.
    Ge, W., Huang, W., Dong, D., Scott, M.R.: Deep metric learning with hierarchical triplet loss. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11210, pp. 272–288. Springer, Cham (2018).  https://doi.org/10.1007/978-3-030-01231-1_17CrossRefGoogle Scholar
  17. 17.
    Gordo, A., Almazán, J., Revaud, J., Larlus, D.: Deep image retrieval: learning global representations for image search. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9910, pp. 241–257. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-46466-4_15CrossRefGoogle Scholar
  18. 18.
    Guiver, J., Snelson, E.: Learning to rank with softrank and gaussian processes. In: Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (2008)Google Scholar
  19. 19.
    Harwood, B., Kumar, B., Carneiro, G., Reid, I., Drummond, T., et al.: Smart mining for deep metric learning. In: Proceedings of ICCV (2017)Google Scholar
  20. 20.
    He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of CVPR (2016)Google Scholar
  21. 21.
    He, K., Cakir, F., Adel Bargal, S., Sclaroff, S.: Hashing as tie-aware learning to rank. In: Proceedings of CVPR (2018)Google Scholar
  22. 22.
    He, K., Lu, Y., Sclaroff, S.: Local descriptors optimized for average precision. In: Proceedings of CVPR (2018)Google Scholar
  23. 23.
    Henderson, P., Ferrari, V.: End-to-end training of object class detectors for mean average precision. In: Lai, S.-H., Lepetit, V., Nishino, K., Sato, Y. (eds.) ACCV 2016. LNCS, vol. 10115, pp. 198–213. Springer, Cham (2017).  https://doi.org/10.1007/978-3-319-54193-8_13CrossRefGoogle Scholar
  24. 24.
    Hermans, A., Beyer, L., Leibe, B.: In defense of the triplet loss for person re-identification. arXiv preprint arXiv:1703.07737 (2017)
  25. 25.
    Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation networks. In: Proceedings of CVPR (2018)Google Scholar
  26. 26.
    Jegou, H., Douze, M., Schmid, C.: Hamming embedding and weak geometric consistency for large scale image search. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008. LNCS, vol. 5302, pp. 304–317. Springer, Heidelberg (2008).  https://doi.org/10.1007/978-3-540-88682-2_24CrossRefGoogle Scholar
  27. 27.
    Jégou, H., Douze, M., Schmid, C.: Product quantization for nearest neighbor search. In: IEEE PAMI (2011)Google Scholar
  28. 28.
    Jégou, H., Douze, M., Schmid, C., Pérez, P.: Aggregating local descriptors into a compact image representation. In: Proceedings of CVPR (2010)Google Scholar
  29. 29.
    Jégou, H., Perronnin, F., Douze, M., Sánchez, J., P’erez, P., Schmid, C.: Aggregating local image descriptors into compact codes. In: IEEE PAMI (2011)Google Scholar
  30. 30.
    Kim, W., Goyal, B., Chawla, K., Lee, J., Kwon, K.: Attention-based ensemble for deep metric learning. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11205, pp. 760–777. Springer, Cham (2018).  https://doi.org/10.1007/978-3-030-01246-5_45CrossRefGoogle Scholar
  31. 31.
    Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. CoRR (2014)Google Scholar
  32. 32.
    Law, M.T., Urtasun, R., Zemel, R.S.: Deep spectral clustering learning. In: Proceedings of ICML (2017)Google Scholar
  33. 33.
    Li, K., Huang, Z., Cheng, Y.C., Lee, C.H.: A maximal figure-of-merit learning approach to maximizing mean average precision with deep neural network based classifiers. In: Proceedings of ICASSP (2014)Google Scholar
  34. 34.
    Liu, W., Wen, Y., Yu, Z., Li, M., Raj, B., Song, L.: Sphereface: deep hypersphere embedding for face recognition. In: Proceedings of CVPR (2017)Google Scholar
  35. 35.
    Lu, J., Xu, C., Zhang, W., Duan, L.Y., Mei, T.: Sampling wisely: deep image embedding by top-k precision optimization. In: Proceedings of ICCV (2019)Google Scholar
  36. 36.
    Manning, C.D., Raghavan, P., Schütze, H.: Introduction to Information Retrieval. Cambridge University Press, New York (2008)CrossRefGoogle Scholar
  37. 37.
    Maze, B., et al.: IARPA janus benchmark-c: face dataset and protocol. In: 2018 International Conference on Biometrics (ICB) (2018)Google Scholar
  38. 38.
    McFee, B., Lanckriet, G.R.: Metric learning to rank. In: Proceedings of ICML (2010)Google Scholar
  39. 39.
    Mohapatra, P., Rolinek, M., Jawahar, C., Kolmogorov, V., Pawan, K.: Efficient optimization for rank-based loss functions. In: Proceedings of CVPR (2018)Google Scholar
  40. 40.
    Movshovitz-Attias, Y., Toshev, A., Leung, T.K., Ioffe, S., Singh, S.: No fuss distance metric learning using proxies. In: Proceedings of CVPR (2017)Google Scholar
  41. 41.
    Oh Song, H., Jegelka, S., Rathod, V., Murphy, K.: Deep metric learning via facility location. In: Proceedings of CVPR (2017)Google Scholar
  42. 42.
    Oh Song, H., Xiang, Y., Jegelka, S., Savarese, S.: Deep metric learning via lifted structured feature embedding. In: Proceedings of CVPR (2016)Google Scholar
  43. 43.
    Opitz, M., Waltner, G., Possegger, H., Bischof, H.: BIER: boosting independent embeddings robustly. In: Proceedings of ICCV (2017)Google Scholar
  44. 44.
    Perronnin, F., Liu, Y., Sánchez, J., Poirier, H.: Large-scale image retrieval with compressed fisher vectors. In: Proceedings of CVPR (2010)Google Scholar
  45. 45.
    Philbin, J., Chum, O., Isard, M., Sivic, J., Zisserman, A.: Object retrieval with large vocabularies and fast spatial matching. In: Proceedings of CVPR (2007)Google Scholar
  46. 46.
    Qian, Q., Shang, L., Sun, B., Hu, J., Li, H., Jin, R.: Softtriple loss: deep metric learning without triplet sampling. In: Proceedings of ICCV (2019)Google Scholar
  47. 47.
    Qin, T., Liu, T.Y., Li, H.: A general approximation framework for direct optimization of information retrieval measures. Inf. Retrieval 13, 375–397 (2010)CrossRefGoogle Scholar
  48. 48.
    Radenović, F., Tolias, G., Chum, O.: CNN image retrieval learns from BoW: unsupervised fine-tuning with hard examples. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9905, pp. 3–20. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-46448-0_1CrossRefGoogle Scholar
  49. 49.
    Rao, Y., Lin, D., Lu, J., Zhou, J.: Learning globally optimized object detector via policy gradient. In: Proceedings of CVPR (2018)Google Scholar
  50. 50.
    Revaud, J., Almazá, J., Rezende, R.S., de Souza, C.R.: Learning with average precision: training image retrieval with a listwise loss. In: Proceedings of ICCV (2019)Google Scholar
  51. 51.
    Rolínek, M., Musil, V., Paulus, A., Vlastelica, M., Michaelis, C., Martius, G.: Optimizing rank-based metrics with blackbox differentiation. In: Proceedings of CVPR (2020)Google Scholar
  52. 52.
    Roth, K., Brattoli, B.: Deep metric learning baselines (2019). https://github.com/Confusezius/Deep-Metric-Learning-Baselines
  53. 53.
    Roth, K., Brattoli, B., Ommer, B.: Mic: mining interclass characteristics for improved metric learning. In: Proceedings of ICCV (2019)Google Scholar
  54. 54.
    Russakovsky, O., et al.: Imagenet large scale visual recognition challenge. IJCV 115, 211–252 (2015)MathSciNetCrossRefGoogle Scholar
  55. 55.
    Sanakoyeu, A., Tschernezki, V., Buchler, U., Ommer, B.: Divide and conquer the embedding space for metric learning. In: Proceedings of CVPR (2019)Google Scholar
  56. 56.
    Schroff, F., Kalenichenko, D., Philbin, J.: Facenet: a unified embedding for face recognition and clustering. In: Proceedings of CVPR (2015)Google Scholar
  57. 57.
    Sivic, J., Zisserman, A.: Video Google: a text retrieval approach to object matching in videos. In: Proceedings of ICCV (2003)Google Scholar
  58. 58.
    Song, H.O., Xiang, Y., Jegelka, S., Savarese, S.: Deep metric learning via lifted structured feature embedding. In: Proceedings of CVPR (2016)Google Scholar
  59. 59.
    Suh, Y., Han, B., Kim, W., Lee, K.M.: Stochastic class-based hard example mining for deep metric learning. In: Proceedings of CVPR (2019)Google Scholar
  60. 60.
    Taylor, M., Guiver, J., Robertson, S., Minka, T.: Softrank: optimising non-smooth rank metrics. In: WSDM (2008)Google Scholar
  61. 61.
    Ustinova, E., Lempitsky, V.: Learning deep embeddings with histogram loss. In: NeurIPS (2016)Google Scholar
  62. 62.
    Van Horn, G., et al.: The INaturalist species classification and detection dataset. In: Proceedings of CVPR (2018)Google Scholar
  63. 63.
    Vlastelica, M., Paulus, A., Musil, V., Martius, G., Rolínek, M.: Differentiation of blackbox combinatorial solvers (2020). https://github.com/martius-lab/blackbox-backprop
  64. 64.
    Wah, C., Branson, S., Welinder, P., Perona, P., Belongie, S.: The Caltech-UCSD Birds-200-2011 Dataset. Technical report CNS-TR-2011-001, California Institute of Technology (2011)Google Scholar
  65. 65.
    Wang, J., Zhou, F., Wen, S., Liu, X., Lin, Y.: Deep metric learning with angular loss. In: Proceedings of ICCV (2017)Google Scholar
  66. 66.
    Wang, J., et al.: Learning fine-grained image similarity with deep ranking. In: Proceedings of CVPR (2014)Google Scholar
  67. 67.
    Wang, X., Hua, Y., Kodirov, E., Hu, G., Garnier, R., Robertson, N.: Ranked list loss for deep metric learning. In: Proceedings of CVPR (2019)Google Scholar
  68. 68.
    Wang, X., Han, X., Huang, W., Dong, D., Scott, M.R.: Multi-similarity loss with general pair weighting for deep metric learning. In: Proceedings of CVPR (2019)Google Scholar
  69. 69.
    Wang, X., Zhang, H., Huang, W., Scott, M.R.: Cross-batch memory for embedding learning. In: Proceedings of CVPR (2020)Google Scholar
  70. 70.
    Weinberger, K., Blitzer, J., Saul, L.: Distance metric learning for large margin nearest neighbor classification. In: NeurIPS (2006)Google Scholar
  71. 71.
    Wu, C.Y., Manmatha, R., Smola, A.J., Krahenbuhl, P.: Sampling matters in deep embedding learning. In: Proceedings of CVPR (2017)Google Scholar
  72. 72.
    Xuan, H., Souvenir, R., Pless, R.: Deep randomized ensembles for metric learning. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11220, pp. 751–762. Springer, Cham (2018).  https://doi.org/10.1007/978-3-030-01270-0_44CrossRefGoogle Scholar
  73. 73.
    Yang, H.F., Lin, K., Chen, C.S.: Cross-batch reference learning for deep classification and retrieval. In: Proceedings of ACMMM (2016)Google Scholar
  74. 74.
    Yuan, Y., Yang, K., Zhang, C.: Hard-aware deeply cascaded embedding. In: Proceedings of ICCV (2017)Google Scholar
  75. 75.
    Yue, Y., Finley, T., Radlinski, F., Joachims, T.: A support vector method for optimizing average precision. In: SIGIR (2007)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  1. 1.Visual Geometry GroupUniversity of OxfordOxfordUK

Personalised recommendations