Advertisement

Repeatability Is Not Enough: Learning Affine Regions via Discriminability

  • Dmytro Mishkin
  • Filip Radenović
  • Jiři Matas
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11213)

Abstract

A method for learning local affine-covariant regions is presented. We show that maximizing geometric repeatability does not lead to local regions, a.k.a features, that are reliably matched and this necessitates descriptor-based learning. We explore factors that influence such learning and registration: the loss function, descriptor type, geometric parametrization and the trade-off between matchability and geometric accuracy and propose a novel hard negative-constant loss function for learning of affine regions. The affine shape estimator – AffNet – trained with the hard negative-constant loss outperforms the state-of-the-art in bag-of-words image retrieval and wide baseline stereo. The proposed training process does not require precisely geometrically aligned patches. The source codes and trained weights are available at https://github.com/ducha-aiki/affnet.

Keywords

Local features Affine shape Loss function Image retrieval 

Notes

Acknowledgements

The authors were supported by the Czech Science Foundation Project GACR P103/12/G084, the Austrian Ministry for Transport, Innovation and Technology, the Federal Ministry of Science, Research and Economy, and the Province of Upper Austria in the frame of the COMET center SCCH, the CTU student grant SGS17/185/OHK3/3T/13, and the MSMT LL1303 ERC-CZ grant.

References

  1. 1.
    Schonberger, J.L., Frahm, J.M.: Structure-from-motion revisited. In: Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4104–4113 (2016)Google Scholar
  2. 2.
    Schonberger, J.L., Hardmeier, H., Sattler, T., Pollefeys, M.: Comparative evaluation of hand-crafted and learned local features. In: Conference on Computer Vision and Pattern Recognition (CVPR) (2017)Google Scholar
  3. 3.
    Mishkin, D., Matas, J., Perdoch, M.: Mods: Fast and robust method for two-view matching. Comput. Vis. Image Underst. 141, 81–93 (2015)CrossRefGoogle Scholar
  4. 4.
    Sattler, T., et al.: Benchmarking 6DOF Urban Visual Localization in Changing Conditions. ArXiv e-prints, July 2017Google Scholar
  5. 5.
    Radenović, F., Tolias, G., Chum, O.: CNN Image retrieval learns from BoW: unsupervised fine-tuning with hard examples. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9905, pp. 3–20. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-46448-0_1CrossRefGoogle Scholar
  6. 6.
    Mikolajczyk, K., Schmid, C.: Scale and affine invariant interest point detectors. Int. J. Comput. Vis. (IJCV) 60(1), 63–86 (2004)CrossRefGoogle Scholar
  7. 7.
    Mikolajczyk, K., et al.: A comparison of affine region detectors. Int. J. Comput. Vis. (IJCV) 65(1), 43–72 (2005)CrossRefGoogle Scholar
  8. 8.
    Rublee, E., Rabaud, V., Konolige, K., Bradski, G.: ORB: an efficient alternative to SIFT or SURF. In: International Conference on Computer Vision (ICCV), pp. 2564–2571 (2011)Google Scholar
  9. 9.
    Lowe, D.G.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. (IJCV) 60(2), 91–110 (2004)MathSciNetCrossRefGoogle Scholar
  10. 10.
    Arandjelovic, R., Zisserman, A.: Three things everyone should know to improve object retrieval. In: Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2911–2918 (2012)Google Scholar
  11. 11.
    Perdoch, M., Chum, O., Matas, J.: Efficient representation of local geometry for large scale object retrieval. In: Conference on Computer Vision and Pattern Recognition (CVPR), pp. 9–16 (2009)Google Scholar
  12. 12.
    Tolias, G., Jegou, H.: Visual query expansion with or without geometry: refining local descriptors by feature aggregation. Pattern Recognit. 47(10), 3466–3476 (2014)CrossRefGoogle Scholar
  13. 13.
    Pritts, J., Kukelova, Z., Larsson, V., Chum, O.: Radially-distorted conjugate translations. In: CVPR (2018)Google Scholar
  14. 14.
    Baumberg, A.: Reliable feature matching across widely separated views. In: CVPR, pp. 1774–1781. IEEE Computer Society (2000)Google Scholar
  15. 15.
    Mishkin, D., Matas, J., Perdoch, M., Lenc, K.: Wxbs: wide baseline stereo generalizations. arXiv:1504.06603 (2015)
  16. 16.
    Schonberger, J.L., Radenovic, F., Chum, O., Frahm, J.M.: From single image query to detailed 3D reconstruction. In: Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5126–5134 (2015)Google Scholar
  17. 17.
    Radenovic, F., Iscen, A., Tolias, G., Avrithis, Y., Chum, O.: Revisiting Oxford and Paris: large-scale image retrieval benchmarking. In: Conference on Computer Vision and Pattern Recognition (CVPR) (2018)Google Scholar
  18. 18.
    Radenovic, F., Schonberger, J.L., Ji, D., Frahm, J.M., Chum, O., Matas, J.: From dusk till dawn: modeling in the dark. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5488–5496 (2016)Google Scholar
  19. 19.
    Zagoruyko, S., Komodakis, N.: Learning to compare image patches via convolutional neural networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) (2015)Google Scholar
  20. 20.
    Han, X., Leung, T., Jia, Y., Sukthankar, R., Berg, A.C.: MatchNet: unifying feature and metric learning for patch-based matching. In: Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3279–3286 (2015)Google Scholar
  21. 21.
    Balntas, V., Riba, E., Ponsa, D., Mikolajczyk, K.: Learning local feature descriptors with triplets and shallow convolutional neural networks. In: British Machine Vision Conference (BMVC) (2016)Google Scholar
  22. 22.
    Tian, Y., Fan, B., Wu, F.: L2-net: deep learning of discriminative patch descriptor in euclidean space. In: Conference on Computer Vision and Pattern Recognition (CVPR) (2017)Google Scholar
  23. 23.
    Mishchuk, A., Mishkin, D., Radenovic, F., Matas, J.: Working hard to know your neighbor’s margins: local descriptor learning loss. In: Proceedings of NIPS, December 2017Google Scholar
  24. 24.
    Zhang, X., Felix, X.Y., Kumar, S., Chang, S.F.: Learning spread-out local feature descriptors. ArXiv e-prints (August 2017)Google Scholar
  25. 25.
    Dosovitskiy, A., Fischer, P., Springenberg, J.T., Riedmiller, M.A., Brox, T.: Discriminative unsupervised feature learning with exemplar convolutional neural networks. IEEE Trans. Pattern Anal. Mach. Intell. 38(9), 1734–1747 (2016)CrossRefGoogle Scholar
  26. 26.
    Verdie, Y., Yi, K., Fua, P., Lepetit, V.: TILDE: a temporally invariant learned detector. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5279–5288 (2015)Google Scholar
  27. 27.
    Zhang, X., Felix, Y., Karaman, S., Chang, S.F.: Learning discriminative and transformation covariant local feature detectors. In: CVPR (2017)Google Scholar
  28. 28.
    Lenc, K., Vedaldi, A.: Learning covariant feature detectors. In: Hua, G., Jégou, H. (eds.) ECCV 2016. LNCS, vol. 9915, pp. 100–117. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-49409-8_11CrossRefGoogle Scholar
  29. 29.
    Savinov, N., Seki, A., Ladicky, L., Sattler, T., Pollefeys, M.: Quad-networks: unsupervised learning to rank for interest point detection. ArXiv e-prints, November 2016Google Scholar
  30. 30.
    Hartmann, W., Havlena, M., Schindler, K.: Predicting matchability. In: CVPR, pp. 9–16. IEEE Computer Society (2014)Google Scholar
  31. 31.
    Yi, K.M., Verdie, Y., Fua, P., Lepetit, V.: Learning to assign orientations to feature points. In: Proceedings of the Computer Vision and Pattern Recognition (2016)Google Scholar
  32. 32.
    Yi, K.M., Trulls, E., Lepetit, V., Fua, P.: LIFT: learned invariant feature transform. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9910, pp. 467–483. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-46466-4_28CrossRefGoogle Scholar
  33. 33.
    Choy, C.B., Gwak, J., Savarese, S., Chandraker, M.: Universal correspondence network. In: Advances in Neural Information Processing Systems, pp. 2414–2422 (2016)Google Scholar
  34. 34.
    Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. In: ICLR (2015)Google Scholar
  35. 35.
    Balntas, V., Lenc, K., Vedaldi, A., Mikolajczyk, K.: HPatches: a benchmark and evaluation of handcrafted and learned local descriptors. In: Conference on Computer Vision and Pattern Recognition (CVPR) (2017)Google Scholar
  36. 36.
    Jaderberg, M., Simonyan, K., Zisserman, A., Kavukcuoglu, K.: Spatial transformer networks. ArXiv e-prints, June 2015Google Scholar
  37. 37.
    Brown, M., Lowe, D.G.: Automatic panoramic image stitching using invariant features. Int. J. Comput. Vis. (IJCV) 74(1), 59–73 (2007)CrossRefGoogle Scholar
  38. 38.
    Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. arXiv:1502.03167 (2015)
  39. 39.
    Nair, V., Hinton, G.E.: Rectified linear units improve restricted Boltzmann machines. In: International Conference on Machine Learning (ICML), pp. 807–814 (2010)Google Scholar
  40. 40.
    Srivastava, N., Hinton, G.E., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. (JMLR) 15(1), 1929–1958 (2014)MathSciNetzbMATHGoogle Scholar
  41. 41.
    Paszke, A., et al.: Automatic differentiation in PyTorch. In: Proceedings of NIPS Workshop, December 2017Google Scholar
  42. 42.
    Mishkin, D., Sergievskiy, N., Matas, J.: Systematic evaluation of convolution neural network advances on the Imagenet. Comput. Vis. Image Underst. 161, 11–19 (2017)CrossRefGoogle Scholar
  43. 43.
    Lenc, K., Gulshan, V., Vedaldi, A.: Vlbenchmarks (2012)Google Scholar
  44. 44.
    Zitnick, C.L., Ramnath, K.: Edge foci interest points. In: International Conference on Computer Vision (ICCV), pp. 359–366 (2011)Google Scholar
  45. 45.
    Hauagge, D.C., Snavely, N.: Image matching using local symmetry features. In: Computer Vision and Pattern Recognition (CVPR), pp. 206–213 (2012)Google Scholar
  46. 46.
    Yang, G., Stewart, C.V., Sofka, M., Tsai, C.L.: Registration of challenging image pairs: initialization, estimation, and decision. Pattern Anal. Mach. Intell. (PAMI) 29(11), 1973–1989 (2007)CrossRefGoogle Scholar
  47. 47.
    Fernando, B., Tommasi, T., Tuytelaars, T.: Location recognition over large time lags. Comput. Vis. Image Underst. 139, 21–28 (2015)CrossRefGoogle Scholar
  48. 48.
    Kelman, A., Sofka, M., Stewart, C.V.: Keypoint descriptors for matching across multiple image modalities and non-linear intensity variations. In: CVPR (2007)Google Scholar
  49. 49.
    Lebeda, K., Matas, J., Chum, O.: Fixing the locally optimized RANSAC. In: BMVC (2012)Google Scholar
  50. 50.
    Mikulik, A., Perdoch, M., Chum, O., Matas, J.: Learning vocabularies over a fine quantization. Int. J. Comput. Vis. (IJCV) 103(1), 163–175 (2013)MathSciNetCrossRefGoogle Scholar
  51. 51.
    Radenović, F., Tolias, G., Chum, O.: Fine-tuning CNN image retrieval with no human annotation. arXiv:1711.02512 (2017)
  52. 52.
    Iscen, A., Tolias, G., Avrithis, Y., Furon, T., Chum, O.: Efficient diffusion on region manifolds: recovering small objects with compact CNN representations. In: CVPR (2017)Google Scholar
  53. 53.
    Gordo, A., Almazan, J., Revaud, J., Larlus, D.: End-to-end learning of deep visual representations for image retrieval. IJCV 124, 237–254 (2017)MathSciNetCrossRefGoogle Scholar
  54. 54.
    Philbin, J., Chum, O., Isard, M., Sivic, J., Zisserman, A.: Object retrieval with large vocabularies and fast spatial matching. In: Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1–8 (2007)Google Scholar
  55. 55.
    Philbin, J., Chum, O., Isard, M., Sivic, J., Zisserman, A.: Lost in quantization: improving particular object retrieval in large scale image databases. In: Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1–8 (2008)Google Scholar
  56. 56.
    Sivic, J., Zisserman, A.: Video Google: a text retrieval approach to object matching in videos. In: International Conference on Computer Vision (ICCV), pp. 1470–1477 (2003)Google Scholar
  57. 57.
    Muja, M., Lowe, D.G.: Fast approximate nearest neighbors with automatic algorithm configuration. In: International Conference on Computer Vision Theory and Application (VISSAPP), pp. 331–340 (2009)Google Scholar
  58. 58.
    Jegou, H., Douze, M., Schmid, C.: Improving bag-of-features for large scale image search. Int. J. Comput. Vis. (IJCV) 87(3), 316–336 (2010)CrossRefGoogle Scholar
  59. 59.
    He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR (2016)Google Scholar
  60. 60.
    Noh, H., Araujo, A., Sim, J., Weyand, T., Han, B.: Large-scale image retrieval with attentive deep local features. In: ICCV (2017)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  1. 1.Visual Recognition Group, Center for Machine Perception, FEECTU in PraguePragueCzech Republic

Personalised recommendations