Advertisement

OS2D: One-Stage One-Shot Object Detection by Matching Anchor Features

Conference paper
  • 695 Downloads
Part of the Lecture Notes in Computer Science book series (LNCS, volume 12360)

Abstract

In this paper, we consider the task of one-shot object detection, which consists in detecting objects defined by a single demonstration. Differently from the standard object detection, the classes of objects used for training and testing do not overlap. We build the one-stage system that performs localization and recognition jointly. We use dense correlation matching of learned local features to find correspondences, a feed-forward geometric transformation model to align features and bilinear resampling of the correlation tensor to compute the detection score of the aligned features. All the components are differentiable, which allows end-to-end training. Experimental evaluation on several challenging domains (retail products, 3D objects, buildings and logos) shows that our method can detect unseen classes (e.g., toothpaste when trained on groceries) and outperforms several baselines by a significant margin. Our code is available online: https://github.com/aosokin/os2d.

Keywords

One-shot detection Object detection Few-shot learning 

Notes

Acknowledgments

We would like to personally thank Ignacio Rocco, Relja Arandjelović, Andrei Bursuc, Irina Saparina and Ekaterina Glazkova for amazing discussions and insightful comments, without which this project would not be possible. This research was partly supported by Samsung Research, Samsung Electronics, by the Russian Science Foundation grant 19-71-00082 and through computational resources of HPC facilities at NRU HSE.

Supplementary material

504470_1_En_38_MOESM1_ESM.pdf (4.4 mb)
Supplementary material 1 (pdf 4457 KB)

References

  1. 1.
    Alexe, B., Deselaers, T., Ferrari, V.: What is an object? In: proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2010)Google Scholar
  2. 2.
    Arandjelović, R., Gronat, P., Torii, A., Pajdla, T., Sivic, J.: NetVLAD: CNN architecture for weakly supervised place recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI) 40(6), 1437–1451 (2018)CrossRefGoogle Scholar
  3. 3.
    Babenko, Artem, Slesarev, Anton, Chigorin, Alexandr, Lempitsky, Victor: Neural Codes for Image Retrieval. In: Fleet, David, Pajdla, Tomas, Schiele, Bernt, Tuytelaars, Tinne (eds.) ECCV 2014. LNCS, vol. 8689, pp. 584–599. Springer, Cham (2014).  https://doi.org/10.1007/978-3-319-10590-1_38CrossRefGoogle Scholar
  4. 4.
    Chen, H., Wang, Y., Wang, G., Qiao, Y.: LSTD: A low-shot transfer detector for object detection. In: proceedings of the AAAI Conference on Artificial Intelligence (AAAI) (2018)Google Scholar
  5. 5.
    Everingham, M., Van Gool, L., Williams, C.K.I., Winn, J., Zisserman, A.: The Pascal Visual Object Classes (VOC) challenge. International Journal of Computer Vision (IJCV) 88(2), 303–338 (2010)CrossRefGoogle Scholar
  6. 6.
    Fan, Q., Zhuo, W., Tang, C.K., Tai, Y.W.: Few-shot object detection with attention-RPN and multi-relation detector. In: proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2020)Google Scholar
  7. 7.
    George, Marian, Floerkemeier, Christian: Recognizing Products: A Per-exemplar Multi-label Image Classification Approach. In: Fleet, David, Pajdla, Tomas, Schiele, Bernt, Tuytelaars, Tinne (eds.) ECCV 2014. LNCS, vol. 8690, pp. 440–455. Springer, Cham (2014).  https://doi.org/10.1007/978-3-319-10605-2_29CrossRefGoogle Scholar
  8. 8.
    Girshick, R.: Fast R-CNN. In: proceedings of the IEEE International Conference on Computer Vision (ICCV) (2015)Google Scholar
  9. 9.
    Goldman, E., Herzig, R., Eisenschtat, A., Ratzon, O., Levi, I., Goldberger, J., Hassner, T.: Precise detection in densely packed scenes. In: proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2019)Google Scholar
  10. 10.
    Gordo, A., Almazán, J., Revaud, J., Larlus, D.: End-to-end learning of deep visual representations for image retrieval. International Journal of Computer Vision (IJCV) 12, 237–254 (2017)MathSciNetCrossRefGoogle Scholar
  11. 11.
    Gordo, A., Larlus, D.: Beyond instance-level image retrieval: Leveraging captions to learn a global visual representation for semantic retrieval. In: proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017)Google Scholar
  12. 12.
    Hsieh, T.I., Lo, Y.C., Chen, H.T., Liu, T.L.: One-shot object detection with co-attention and co-excitation. In: Advances in Neural Information Processing Systems (NeurIPS). vol. 32 (2019)Google Scholar
  13. 13.
    Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: proceedings of the International Conference on Machine Learning (ICML) (2015)Google Scholar
  14. 14.
    Iscen, A., Tolias, G., Avrithis, Y., Furon, T., Chum, O.: Efficient diffusion on region manifolds: Recovering small objects with compact CNN representations. In: proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017)Google Scholar
  15. 15.
    Jaderberg, M., Simonyan, K., Zisserman, A., Kavukcuoglu, K.: Spatial transformer networks. In: Advances in Neural Information Processing Systems (NIPS). vol. 28 (2015)Google Scholar
  16. 16.
    Kang, B., Liu, Z., Wang, X., Yu, F., Feng, J., Darrell, T.: Few-shot object detection via feature reweighting. In: proceedings of the IEEE International Conference on Computer Vision (ICCV) (2019)Google Scholar
  17. 17.
    Karlinsky, L., Shtok, J., Harary, S., Schwartz, E., Aides, A., Feris, R., Giryes, R., Bronstein, A.: RepMet: Representative-based metric learning for classification and one-shot object detection. In: proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2019)Google Scholar
  18. 18.
    Lin, T.Y., Dollár, P., Girshick, R., He, K., Hariharan, B., Belongie, S.: Feature pyramid networks for object detection. In: proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017)Google Scholar
  19. 19.
    Lin, T.Y., Goyal, P., Girshick, R., Kaiming He, P.D.: Focal loss for dense object detection. In: proceedings of the IEEE International Conference on Computer Vision (ICCV) (2017)Google Scholar
  20. 20.
    Lin, Tsung-Yi, Maire, Michael, Belongie, Serge, Hays, James, Perona, Pietro, Ramanan, Deva, Dollár, Piotr, Zitnick, C.Lawrence: Microsoft COCO: Common Objects in Context. In: Fleet, David, Pajdla, Tomas, Schiele, Bernt, Tuytelaars, Tinne (eds.) ECCV 2014. LNCS, vol. 8693, pp. 740–755. Springer, Cham (2014).  https://doi.org/10.1007/978-3-319-10602-1_48CrossRefGoogle Scholar
  21. 21.
    Liu, Wei, Anguelov, Dragomir, Erhan, Dumitru, Szegedy, Christian, Reed, Scott, Fu, Cheng-Yang, Berg, Alexander C.: SSD: Single Shot MultiBox Detector. In: Leibe, Bastian, Matas, Jiri, Sebe, Nicu, Welling, Max (eds.) ECCV 2016. LNCS, vol. 9905, pp. 21–37. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-46448-0_2CrossRefGoogle Scholar
  22. 22.
    Lowe, D.G.: Object recognition from local scale-invariant features. In: proceedings of the IEEE International Conference on Computer Vision (ICCV) (1999)Google Scholar
  23. 23.
    Lowe, D.G.: Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision (IJCV) 60(2), 91–110 (2004)CrossRefGoogle Scholar
  24. 24.
    Massa, F., Girshick, R.: maskrcnn-benchmark: fast, modular reference implementation of instance segmentation and object detection algorithms in PyTorch. https://github.com/facebookresearch/maskrcnn-benchmark (2018), accessed: 01 March 2020
  25. 25.
    Michaelis, C., Ustyuzhaninov, I., Bethge, M., Ecker, A.S.: One-shot instance segmentation. arXiv:1811.11507v1 (2018)
  26. 26.
    Mobahi, H., Collobert, R., Weston, J.: Deep learning from temporal coherence in video. In: proceedings of the International Conference on Machine Learning (ICML) (2009)Google Scholar
  27. 27.
    Noh, H., Araujo, A., Sim, J., Weyand, T., Han, B.: Large-scale image retrieval with attentive deep local features. In: proceedings of the IEEE International Conference on Computer Vision (ICCV) (2017)Google Scholar
  28. 28.
    Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J., Chanan, G., Killeen, T., Lin, Z., Gimelshein, N., Antiga, L., Desmaison, A., Kopf, A., Yang, E., DeVito, Z., Raison, M., Tejani, A., Chilamkurthy, S., Steiner, B., Fang, L., Bai, J., Chintala, S.: PyTorch: An imperative style, high-performance deep learning library. In: Advances in Neural Information Processing Systems (NeurIPS). vol. 32 (2019)Google Scholar
  29. 29.
    Pérez-Rúa, J.M., Zhu, X., Hospedales, T., Xiang, T.: Incremental few-shot object detection. In: proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2020)Google Scholar
  30. 30.
    Radenović, Filip, Tolias, Giorgos, Chum, Ondřej: CNN Image Retrieval Learns from BoW: Unsupervised Fine-Tuning with Hard Examples. In: Leibe, Bastian, Matas, Jiri, Sebe, Nicu, Welling, Max (eds.) ECCV 2016. LNCS, vol. 9905, pp. 3–20. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-46448-0_1CrossRefGoogle Scholar
  31. 31.
    Radenović, F., Tolias, G., Chum, O.: Fine-tuning CNN image retrieval with no human annotation. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI) 41(6), 1655–1668 (2019)CrossRefGoogle Scholar
  32. 32.
    Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You Only Look Once: Unified, real-time object detection. In: proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2016)Google Scholar
  33. 33.
    Redmon, J., Farhadi, A.: YOLO9000: Better, faster, stronger. In: proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017)Google Scholar
  34. 34.
    Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI) 39(6), 1137–1149 (2017)CrossRefGoogle Scholar
  35. 35.
    Rocco, I., Arandjelović, R., Sivic, J.: Convolutional neural network architecture for geometric matching. In: proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017)Google Scholar
  36. 36.
    Rocco, I., Arandjelović, R., Sivic, J.: Convolutional neural network architecture for geometric matching. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI) 41(11), 2553–2567 (2018)CrossRefGoogle Scholar
  37. 37.
    Rocco, I., Arandjelović, R., Sivic, J.: End-to-end weakly-supervised semantic alignment. In: proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2018)Google Scholar
  38. 38.
    Rocco, I., Cimpoi, M., Arandjelović, R., Torii, A., Pajdla, T., Sivic, J.: Neighbourhood consensus networks. In: Advances in Neural Information Processing Systems (NeurIPS). vol. 31 (2018)Google Scholar
  39. 39.
    Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M., Berg, A.C., Fei-Fei, L.: ImageNet large scale visual recognition challenge. International Journal of Computer Vision (IJCV) 115(3), 211–252 (2015)MathSciNetCrossRefGoogle Scholar
  40. 40.
    Simo-Serra, E., Trulls, E., Ferraz, L., Kokkinos, I., Fua, P., Moreno-Noguer, F.: Discriminative learning of deep convolutional feature point descriptors. In: proceedings of the IEEE International Conference on Computer Vision (ICCV) (2015)Google Scholar
  41. 41.
    Wang, S., Jiang, S.: INSTRE: a new benchmark for instance-level object retrieval and recognition. ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM) 11(3), 37 (2015)Google Scholar
  42. 42.
    Wang, X., Huang, T.E., Darrell, T., Gonzalez, J.E., Yu, F.: Frustratingly simple few-shot object detection. In: proceedings of the International Conference on Machine Learning (ICML) (2020)Google Scholar
  43. 43.
    Wang, X., Hua, Y., Kodirov, E., Hu, G., Garnier, R., Robertson, N.M.: Ranked list loss for deep metric learning. In: proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2019)Google Scholar
  44. 44.
    Wang, Y.X., Ramanan, D., Hebert, M.: Meta-learning to detect rare objects. In: proceedings of the IEEE International Conference on Computer Vision (ICCV) (2019)Google Scholar
  45. 45.
    Wu, C.Y., Manmatha, R., Smola, A.J., Krähenbühl, P.: Sampling matters in deep embedding learning. In: proceedings of the IEEE International Conference on Computer Vision (ICCV) (2017)Google Scholar
  46. 46.
    Wu, Yuxin, He, Kaiming: Group Normalization. In: Ferrari, Vittorio, Hebert, Martial, Sminchisescu, Cristian, Weiss, Yair (eds.) ECCV 2018. LNCS, vol. 11217, pp. 3–19. Springer, Cham (2018).  https://doi.org/10.1007/978-3-030-01261-8_1CrossRefGoogle Scholar
  47. 47.
    Yan, X., Chen, Z., Xu, A., Wang, X., Liang, X., Lin, L.: Meta R-CNN: Towards general solver for instance-level low-shot learning. In: proceedings of the IEEE International Conference on Computer Vision (ICCV) (2019)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  1. 1.National Research University Higher School of EconomicsMoscowRussia
  2. 2.YandexMoscowRussia
  3. 3.mirum.ioMoscowRussia

Personalised recommendations