Advertisement

Object instance identification with fully convolutional networks

  • Maxime Portaz
  • Matthias Kohl
  • Jean-Pierre Chevallet
  • Georges Quénot
  • Philippe Mulhem
Article
  • 86 Downloads

Abstract

This paper presents a novel approach for instance search and object detection, applied to museum visits. This approach relies on fully convolutional networks (FCN) to obtain region proposals and object representation. Our proposal consists in four steps: a classical convolutional network is first fined-tuned as classifier over the dataset, next we build from this network a second one, fully convolutional, trained as classifier, that focuses on all regions of the corpus images, this network is used in a third step to define image global descriptors in a siamese architecture using triplets of images, and eventually these descriptors are then used for retrieval using classical scalar product between vectors. Our framework has the following features: i) it is well suited for small datasets with low objects variability as we use transfer learning, ii) it does not require any additional component in the network as we rely on classical (i.e. not fully convolutional) and fully convolutional networks, and iii) it does not need region annotations in the dataset as it deals with regions in a unsupervised way. Through multiple experiments on two image datasets taken from museum visits, we detail the effect of each parameter, and we show that the descriptors obtained using our proposed network outperform those from previous state-of-the-art approaches.

Keywords

Fully convolutional network Triplet loss Siamese network Instance search Image retrieval 

References

  1. 1.
    Arandjelović R, Zisserman A (2012) Three things everyone should know to improve object retrieval. In: 2012 IEEE conference on computer vision and pattern recognition (CVPR). IEEE, pp 2911–2918. http://ieeexplore.ieee.org/abstract/document/6248018/
  2. 2.
    Arandjelovic R, Gronat P, Torii A, Pajdla T, Sivic J (2016) Netvlad: Cnn architecture for weakly supervised place recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5297–5307Google Scholar
  3. 3.
    Babenko A, Lempitsky V (2015) Aggregating local deep features for image retrieval. In: Proceedings of the IEEE international conference on computer vision, pp 1269–1277Google Scholar
  4. 4.
    Babenko A, Slesarev A, Chigorin A, Lempitsky V (2014) Neural codes for image retrieval. In: European conference on computer vision. Springer, pp 584–599Google Scholar
  5. 5.
    Barroso LA, Dean J, Holzle U (2003) Web search for a planet: the google cluster architecture. IEEE Micro 23(2):22–28CrossRefGoogle Scholar
  6. 6.
    Bromley J, Guyon I, LeCun Y, Säckinger E, Shah R (1994) Signature verification using a “siamese” time delay neural network. In: Advances in neural information processing systems, pp 737–744Google Scholar
  7. 7.
    Fischer P, Dosovitskiy A, Brox T (2014) Descriptor matching with convolutional neural networks: a comparison to sift. arXiv:1405.5769
  8. 8.
    Gong Y, Wang L, Guo R, Lazebnik S (2014) Multi-scale orderless pooling of deep convolutional activation features. In: European conference on computer vision. Springer, pp 392–407Google Scholar
  9. 9.
    Gordo A, Almazán J, Revaud J, Larlus D (2016) Deep image retrieval: learning global representations for image search. In: Computer vision – ECCV 2016. Springer, Cham, pp 241–257Google Scholar
  10. 10.
    Gordo A, Almazan J, Revaud J, Larlus D (2016) End-to-end learning of deep visual representations for image retrieval. arXiv:1610.07940
  11. 11.
    He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778Google Scholar
  12. 12.
    Jegou H, Douze M, Schmid C (2008) Hamming embedding and weak geometric consistency for large scale image search. In: European conference on computer vision. Springer, Berlin, Heidelberg, pp 304–317Google Scholar
  13. 13.
    Krizhevsky A, Sutskever I, Hinton GE (2012) ImageNet classification with deep convolutional neural networks. In: Pereira F, Burges CJC, Bottou L, Weinberger KQ (eds) Advances in neural information processing systems 25. Curran Associates, Inc., pp 1097–1105. http://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks.pdf
  14. 14.
    Long J, Shelhamer E, Darrell T (2015) Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3431–3440Google Scholar
  15. 15.
    Lowe DG (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vis 60(2):91–110CrossRefGoogle Scholar
  16. 16.
    Paulin M, Douze M, Harchaoui Z, Mairal J, Perronin F, Schmid C (2015) Local convolutional features with unsupervised training for image retrieval. In: Proceedings of the IEEE international conference on computer vision, pp 91–99Google Scholar
  17. 17.
    Perronnin F, Dance C (2007) Fisher kernels on visual vocabularies for image categorization. In: IEEE conference on computer vision and pattern recognition, 2007. CVPR’07. IEEE, pp 1–8Google Scholar
  18. 18.
    Philbin J, Chum O, Isard M, Sivic J, Zisserman A (2007) Object retrieval with large vocabularies and fast spatial matching. In: IEEE conference on computer vision and pattern recognition, 2007. CVPR’07. IEEE, pp 1–8. http://ieeexplore.ieee.org/abstract/document/4270197/
  19. 19.
    Philbin J, Chum O, Isard M, Sivic J, Zisserman A (2008) Lost in quantization: improving particular object retrieval in large scale image databases. In: IEEE conference on computer vision and pattern recognition, 2008. CVPR 2008. IEEE, pp 1–8. http://ieeexplore.ieee.org/abstract/document/4587635/
  20. 20.
    Portaz M, Poignant J, Budnik M, Mulhem P, Chevallet J, Goeuriot L (2017) Construction et évaluation d’un corpus pour la recherche d’instances d’images muséales. In: COnférence en recherche d’informations et applications - CORIA 2017, 14th French information retrieval conference. Marseille, France, March 29–31, 2017. Proceedings, Marseille, France, March 29–31, 2017, pp 17–34Google Scholar
  21. 21.
    Radenović F, Tolias G, Chum O (2016) CNN image retrieval learns from BoW: unsupervised fine-tuning with hard examples. In: Computer vision – ECCV 2016. Springer, Cham, pp 3–20Google Scholar
  22. 22.
    Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, Huang Z, Karpathy A, Khosla A, Bernstein M, Berg AC, Fei-Fei L (2015) ImageNet large scale visual recognition challenge. Int J Comput Vis 115(3):211–252.  https://doi.org/10.1007/s11263-015-0816-y MathSciNetCrossRefGoogle Scholar
  23. 23.
    Salvador A, Giro-i Nieto X, Marques F, Satoh S (2016) Faster r-CNN Features for Instance Search. arXiv:1604.08893 [cs]
  24. 24.
    Schroff F, Kalenichenko D, Philbin J (2015) Facenet: a unified embedding for face recognition and clustering. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 815–823Google Scholar
  25. 25.
    Sharif Razavian A, Azizpour H, Sullivan J, Carlsson S (2014) Cnn features off-the-shelf: an astounding baseline for recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pp 806–813Google Scholar
  26. 26.
    Simo-Serra E, Trulls E, Ferraz L, Kokkinos I, Moreno-Noguer F (2014) Fracking deep convolutional image descriptors. arXiv:1412.6537
  27. 27.
    Sivic J, Zisserman A et al. (2003) Video google: a text retrieval approach to object matching in videos. In: iccv, vol 2, pp 1470–1477Google Scholar
  28. 28.
    Tolias G, Sicre R, Jégou H (2015) Particular object retrieval with integral max-pooling of CNN activations. arXiv:1511.05879 [cs]
  29. 29.
    Turcot P, Lowe DG (2009) Better matching with fewer features: the selection of useful features in large database recognition problems. In: 2009 IEEE 12th international conference on computer vision workshops (ICCV Workshops). IEEE, pp 2109–2116. http://ieeexplore.ieee.org/abstract/document/5457541/
  30. 30.
    Yosinski J, Clune J, Bengio Y, Lipson H (2014) How transferable are features in deep neural networks? arXiv:1411.1792 [cs]
  31. 31.
    Zbontar J, LeCun Y (2016) Stereo matching by training a convolutional neural network to compare image patches. J Mach Learn Res 17(1–32):2zbMATHGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2018

Authors and Affiliations

  1. 1.CNRS, Grenoble-INP, LIGUniversity Grenoble AlpesGrenobleFrance

Personalised recommendations