Multimedia Tools and Applications

, Volume 69, Issue 2, pp 443–469 | Cite as

Retina enhanced SURF descriptors for spatio-temporal concept detection

  • Sabin Tiberius Strat
  • Alexandre Benoit
  • Patrick Lambert
  • Alice Caplier


This paper proposes to investigate the potential benefit of the use of low-level human vision behaviors in the context of high-level semantic concept detection. A large part of the current approaches relies on the Bag-of-Words (BoW) model, which has proven itself to be a good choice especially for object recognition in images. Its extension from static images to video sequences exhibits some new problems to cope with, mainly the way to use the temporal information related to the concepts to detect (swimming, drinking...). In this study, we propose to apply a human retina model to preprocess video sequences before constructing the State-Of-The-Art BoW analysis. This preprocessing, designed in a way that enhances relevant information, increases the performance by introducing robustness to traditional image and video problems, such as luminance variation, shadows, compression artifacts and noise. Additionally, we propose a new segmentation method which enables a selection of low-level spatio-temporal potential areas of interest from the visual scene, without slowing the computation as much as a high-level saliency model would. These approaches are evaluated on the TrecVid 2010 and 2011 Semantic Indexing Task datasets, containing from 130 to 346 high-level semantic concepts. We also experiment with various parameter settings to check their effect on performance.


Bag of Words Retina enhancement Low-level saliency Semantics SURF Video indexation 



This work would not have been possible without the IRIM5 French consortium, who provided the processing toolchain for the unified descriptors evaluation.


  1. 1.
    Alahi A, Ortiz R, Vandergheynst P (2012) FREAK: fast retina keypoint. In: IEEE conference on computer vision and pattern recognition. CVPR 2012 Open Source Award WinnerGoogle Scholar
  2. 2.
    Ali WBH, Debreuve E, Kornprobst P, Barlaud M (2011) Bio-inspired bags-of-features for image classification. In: KDIR, pp 277–281Google Scholar
  3. 3.
    Arthur D, Vassilvitskii S (2007) K-means+ +: the advantages of careful seeding. In: SODA, pp 1027–1035Google Scholar
  4. 4.
    Ballas N, Delezoide B, Prêteux F (2011) Trajectories based descriptor for dynamic events annotation. In: Proceedings of the 2011 joint ACM workshop on modeling and representing events, J-MRE ’11, pp 13–18. ACM, New York, NY. doi: 10.1145/2072508.2072512 CrossRefGoogle Scholar
  5. 5.
    Benoit A, Caplier A, Durette B, Herault J (2010) Using human visual system modeling for bio-inspired low level image processing. Comput Vis Image Underst 114(7):758–773. doi: 10.1016/j.cviu.2010.01.011. CrossRefGoogle Scholar
  6. 6.
    Chen MY, Hauptmann A (2009) Mosift: recognizing human actions in surveillance videos. Tech. Rep. CMU-CS-09-161, Carnegie Mellon UniversityGoogle Scholar
  7. 7.
    Csurka G, Dance CR, Fan L, Willamowski J, Bray C (2004) Visual categorization with bags of keypoints. In: Workshop on statistical learning in computer vision. ECCV, pp 1–22Google Scholar
  8. 8.
    Dalal N, Triggs B, Schmid C (2006) Human detection using oriented histograms of flow and appearance. In: ECCV (2)’06, pp 428–441Google Scholar
  9. 9.
    Datta R, Joshi D, Li J, Wang JZ (2008) Image retrieval: ideas, influences, and trends of the new age. ACM Comput Surv 40:5:1–5:60. doi: 10.1145/1348246.1348248 CrossRefGoogle Scholar
  10. 10.
    Gorisse D, Precioso F, Gosselin P, Granjon L, Pellerin D, Rombaut M, Bredin H, Koenig L, Vieux R, Mansencal B, Benois-Pineau J, Boujut H, Morand C, Jégou H, Ayache S, Safadi B, Tong Y, Thollard F, Quénot MG, Cord M, Benoit A, Lambert P (2010) IRIM at TRECVID 2010: semantic indexing and instance search. In: TREC online proceedings. Gaithersburg, États-Unis, GDR ISIS.
  11. 11.
    Hérault J (2009) Vision: signals, images and neural networks. In: Progress in neural processing. World Scientific Publishers, Département Images et Signal.
  12. 12.
    Itti L, Koch C, Niebur E (1998) A model of saliency-based visual attention for rapid scene analysis. IEEE Trans Pattern Anal Mach Intell 20(11):1254–1259CrossRefGoogle Scholar
  13. 13.
    Juan L, Gwun O (2009) A comparison of sift, pca-sift and surf. Int J Image Process IJIP 3(4):143–152. Google Scholar
  14. 14.
    Laptev I, Marszałek M, Schmid C, Rozenfeld B (2008) Learning realistic human actions from movies. In: Conference on computer vision & pattern recognition.
  15. 15.
    Le Meur O, Le Callet P, Barba D, Thoreau D (2006) A coherent computational approach to model bottom-up visual attention. IEEE Trans Pattern Anal Mach Intell 28(5):802–817. doi: 10.1109/TPAMI.2006.86 CrossRefGoogle Scholar
  16. 16.
    Mantiuk R, Daly S, Myszkowski K, Seidel HP (2005) Predicting visible differences in high dynamic range images—model and its calibration. In: Rogowitz BE, Pappas TN, Daly SJ (eds) Human vision and electronic imaging X, IS&T/SPIE’s 17th annual symposium on electronic imaging (2005), vol 5666, pp 204–214Google Scholar
  17. 17.
    Niaz U, Redi M, Tanase C, Merialdo B, Farinella G, Li Q (2011) EURECOM at TrecVid 2011: the light semantic indexing task. In: TRECVid'2011, 15th international workshop on video retrieval evaluation, 2011, national institute of standards and technology. Gaithersburg, USAGoogle Scholar
  18. 18.
    Over P, Awad G, Michel M, Fiscus J, Kraaij W, Smeaton AF, Quenot G (2011) Trecvid 2011–an overview of the goals, tasks, data, evaluation mechanisms and metrics. In: Proceedings of TRECVID 2011. NIST, USAGoogle Scholar
  19. 19.
    Redi M, MÃrialdo B (2011) Saliency moments for image categorization. In: ICMR 2011, 1st ACM international conference on multimedia retrieval, Trento, Italy, 17–20 April 2011. doi: 10.1145/1991996.1992035.
  20. 20.
    Reinhard E, Devlin K (2005) Dynamic range reduction inspired by photoreceptor physiology. IEEE Trans Vis Comput Graph 11:13–24. doi:10.1109/TVCG.2005.9 CrossRefGoogle Scholar
  21. 21.
    van de Sande KE, Gevers T, Snoek CG (2008) A comparison of color features for visual concept classification. In: Proceedings of the 2008 international conference on Content-based Image and Video Retrieval, CIVR ’08. ACM, New York, NY, pp 141–150. doi:10.1145/1386352.1386376 CrossRefGoogle Scholar
  22. 22.
    Smeaton AF, Over P, Kraaij W (2009) High-level feature detection from video in TRECVid: a 5-year retrospective of achievements. In: Divakaran A (ed.) Multimedia content analysis, theory and applications. Springer, Berlin, pp 151–174Google Scholar
  23. 23.
    Snoek CGM, Worring M (2009) Concept-based video retrieval. Found Trends Inf Retr 2:215–322. doi: 10.1561/1500000014 CrossRefGoogle Scholar
  24. 24.
    Sprague JM, Meikle TH Jr. (1965) The role of the superior colliculus in visually guided behavior. Exp Neurol 11(1):115–146. doi: 10.1016/0014-4886(65)90026-9. CrossRefGoogle Scholar
  25. 25.
    Wang H, Kläser A, Schmid C, Cheng-Lin L (2011) Action recognition by Dense Trajectories. In: IEEE conference on computer vision & pattern recognition. Colorado Springs, USA, pp 3169–3176.
  26. 26.
    Yilmaz E, Aslam JA (2006) Estimating average precision with incomplete and imperfect judgments. In: Proceedings of the 15th ACM international Conference on Information and Knowledge Management, CIKM ’06. ACM, New York, NY, pp 102–111. doi:10.1145/1183614.1183633 CrossRefGoogle Scholar
  27. 27.
    Yilmaz E, Kanoulas E, Aslam JA (2008) A simple and efficient sampling method for estimating ap and ndcg. In: Proceedings of the 31st annual international ACM SIGIR conference on research and development in information retrieval, SIGIR ’08. ACM, New York, NY, pp 603–610. doi:10.1145/1390334.1390437 CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media New York 2012

Authors and Affiliations

  • Sabin Tiberius Strat
    • 1
    • 2
  • Alexandre Benoit
    • 1
  • Patrick Lambert
    • 1
  • Alice Caplier
    • 3
  1. 1.LISTIC - Université de SavoieAnnecy Le VieuxFrance
  2. 2.LAPI - University “Politechnica” of BucharestBucharestRomania
  3. 3.Gipsa-Lab - Université de GrenobleSt Martin d’HèresFrance

Personalised recommendations