Adaptive Image Representation Using Information Gain and Saliency: Application to Cultural Heritage Datasets

  • Dorian Michaud
  • Thierry Urruty
  • François Lecellier
  • Philippe Carré
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10704)


Recently, the advent of deep neural networks showed great performances for supervised image analysis tasks. However, image expert datasets with little information or prior knowledge still need indexing tools that best represent the expert wishes. Our work fits in this very specific application context where only few expert users may appropriately label the images. Thus, in this paper, we consider small expert collections with no associated relevant label set, nor structured knowledge. In this context, we propose an automatic and adaptive framework based on the well-known bags of visual words and phrases models that select relevant visual descriptors for each keypoint to construct a more discriminating image representation. In this framework, we mix an information gain model and visual saliency information to enhance the image representation. Experiment results show the adaptiveness and the performance of our unsupervised framework on well-known “generic” datasets and also on a cultural heritage expert dataset.


Cultural heritage collection Content Based Image Retrieval Information gain Visual saliency 


  1. 1.
    Babenko, A., Slesarev, A., Chigorin, A., Lempitsky, V.: Neural codes for image retrieval. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014 Part I. LNCS, vol. 8689, pp. 584–599. Springer, Cham (2014). Google Scholar
  2. 2.
    CESCM: Romane dataset (2015).
  3. 3.
    Chatoux, H., Lecellier, F., Fernandez-Maloigne, C.: Comparative study of descriptors with dense key points. In: 23rd International Conference on Pattern Recognition, ICPR 2016, Cancún, Mexico, 4–8 December 2016, pp. 1988–1993 (2016)Google Scholar
  4. 4.
    Chen, Q., Song, Z., Dong, J., Huang, Z., Hua, Y., Yan, S.: Contextualizing object detection and classification. IEEE Trans. Pattern Anal. Mach. Intell. 37(1), 13–27 (2015)CrossRefGoogle Scholar
  5. 5.
    Chen, T., Yap, K.H., Zhang, D.: Discriminative soft bag-of-visual phrase for mobile landmark recognition. IEEE Trans. Multimed. 16(3), 612–622 (2014)CrossRefGoogle Scholar
  6. 6.
    Csurka, G., Dance, C.R., Fan, L., Willamowski, J., Bray, C.: Visual categorization with bags of keypoints. In: Workshop on Statistical Learning in Computer Vision, ECCV, pp. 1–22 (2004)Google Scholar
  7. 7.
    Delhumeau, J., Gosselin, P.H., Jégou, H., Pérez, P.: Revisiting the VLAD image representation. In: ACM Multimedia, Barcelona, Spain (2013)Google Scholar
  8. 8.
    Eggert, C., Romberg, S., Lienhart, R.: Improving VLAD: hierarchical coding and a refined local coordinate system. In: 2014 IEEE International Conference on Image Processing (ICIP), pp. 3018–3022 (2014)Google Scholar
  9. 9.
    Everingham, M., Van Gool, L., Williams, C.K.I., Winn, J., Zisserman, A.: The PASCAL Visual Object Classes Challenge 2012 (VOC2012) Results (2012)Google Scholar
  10. 10.
    Gando, G., Yamada, T., Sato, H., Oyama, S., Kurihara, M.: Fine-tuning deep convolutional neural networks for distinguishing illustrations from photographs. Expert Syst. Appl. 66, 295–301 (2016)CrossRefGoogle Scholar
  11. 11.
    Gbehounou, S.: Image database indexing: emotional impact evaluation. Theses, Université de Poitiers, November 2014Google Scholar
  12. 12.
    Harel, J., Koch, C., Perona, P.: Graph-based visual saliency. In: Advances in Neural Information Processing Systems, pp. 545–552 (2006)Google Scholar
  13. 13.
    He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE CVPR (2016)Google Scholar
  14. 14.
    Jégou, H., Douze, M., Schmid, C., Pérez, P.: Aggregating local descriptors into a compact image representation. In: 2010 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3304–3311 (2010)Google Scholar
  15. 15.
    Jégou, H., Douze, M., Schmid, C.: Hamming embedding and weak geometric consistency for large scale image search. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008 Part I. LNCS, vol. 5302, pp. 304–317. Springer, Heidelberg (2008). CrossRefGoogle Scholar
  16. 16.
    Jung, S.I., Hong, K.S.: Deep network aided by guiding network for pedestrian detection. Pattern Recogn. Lett. 90, 43–49 (2017)CrossRefGoogle Scholar
  17. 17.
    Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Proceedings of the 25th International Conference on Neural Information Processing Systems (2012)Google Scholar
  18. 18.
    LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. Nature 521, 7553 (2015)CrossRefGoogle Scholar
  19. 19.
    LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998)CrossRefGoogle Scholar
  20. 20.
    Nistér, D., Stewénius, H.: Scalable recognition with a vocabulary tree. In: IEEE Conference on Computer Vision and Pattern Recognition, vol. 2 (2006)Google Scholar
  21. 21.
    Pedrosa, G., Traina, A.: From bag-of-visual-words to bag-of-visual-phrases using n-grams. In: SIBGRAPI Conference on Graphics, Patterns and Images, pp. 304–311 (2013)Google Scholar
  22. 22.
    Picard, D., Gosselin, P.H., Gaspard, M.C.: Challenges in content-based image indexing of cultural heritage collections. IEEE SP Mag. 32(4), 95–102 (2015)CrossRefGoogle Scholar
  23. 23.
    Pittaras, N., Markatopoulou, F., Mezaris, V., Patras, I.: Comparison of fine-tuning and extension strategies for deep convolutional neural networks. In: Amsaleg, L., Guðmundsson, G.Þ., Gurrin, C., Jónsson, B.Þ., Satoh, S. (eds.) MMM 2017 Part I. LNCS, vol. 10132, pp. 102–114. Springer, Cham (2017). CrossRefGoogle Scholar
  24. 24.
    Ren, Y., Benois-Pineau, J., Bugeau, A.: A comparative study of irregular pyramid matching in bag-of-bags of words model for image retrieval. In: 6th International conference on Image and Signal Processing, ICISP 2014 (2014)Google Scholar
  25. 25.
    Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M., Berg, A.C., Fei-Fei, L.: ImageNet large scale visual recognition challenge. IJCV 115(3), 211–252 (2015)MathSciNetCrossRefGoogle Scholar
  26. 26.
    Salton, G., Buckley, C.: Term-weighting approaches in automatic text retrieval. Inf. Process. Manag. 24, 513–523 (1988)CrossRefGoogle Scholar
  27. 27.
    Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Going deeper with convolutions. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2015)Google Scholar
  28. 28.
    Wu, Y., Liu, H., Yuan, J., Zhang, Q.: Is visual saliency useful for content-based image retrieval? Multimed. Tools Appl. 76, 1–24 (2017)CrossRefGoogle Scholar
  29. 29.
    Yandex, A.B., Lempitsky, V.: Aggregating local deep features for image retrieval. In: 2015 IEEE International Conference on Computer Vision, pp. 1269–1277 (2015)Google Scholar
  30. 30.
    Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Proceedings of the British Machine Vision Conference, BMVC, York (2016)Google Scholar

Copyright information

© Springer International Publishing AG 2018

Authors and Affiliations

  • Dorian Michaud
    • 1
    • 2
  • Thierry Urruty
    • 1
  • François Lecellier
    • 1
  • Philippe Carré
    • 1
  1. 1.CNRS, Univ. Poitiers, XLIM, UMR 7252PoitiersFrance
  2. 2.Quadra InformatiquePhalempinFrance

Personalised recommendations