An Attention-Driven Model for Grouping Similar Images with Image Retrieval Applications

  • Oge MarquesEmail author
  • Liam M Mayron
  • Gustavo B Borba
  • Humberto R Gamba
Open Access
Research Article
Part of the following topical collections:
  1. Image Perception


Recent work in the computational modeling of visual attention has demonstrated that a purely bottom-up approach to identifying salient regions within an image can be successfully applied to diverse and practical problems from target recognition to the placement of advertisement. This paper proposes an application of a combination of computational models of visual attention to the image retrieval problem. We demonstrate that certain shortcomings of existing content-based image retrieval solutions can be addressed by implementing a biologically motivated, unsupervised way of grouping together images whose salient regions of interest (ROIs) are perceptually similar regardless of the visual contents of other (less relevant) parts of the image. We propose a model in which only the salient regions of an image are encoded as ROIs whose features are then compared against previously seen ROIs and assigned cluster membership accordingly. Experimental results show that the proposed approach works well for several combinations of feature extraction techniques and clustering algorithms, suggesting a promising avenue for future improvements, such as the addition of a top-down component and the inclusion of a relevance feedback mechanism.


Feature Extraction Visual Attention Image Retrieval Similar Image Relevance Feedback 


  1. 1.
    Marques O, Furht B: Content-Based Image and Video Retrieval. Kluwer Academic, Boston, Mass, USA; 2002.zbMATHCrossRefGoogle Scholar
  2. 2.
    Rui Y, Huang TS, Chang S-F: Image retrieval: current techniques, promising directions, and open issues. Journal of Visual Communication and Image Representation 1999,10(1):39–62. 10.1006/jvci.1999.0413CrossRefGoogle Scholar
  3. 3.
    Smeulders AMW, Worring M, Santini S, Gupta A, Jain R: Content-based image retrieval at the end of the early years. IEEE Transactions on Pattern Analysis and Machine Intelligence 2000,22(12):1349–1380. 10.1109/34.895972CrossRefGoogle Scholar
  4. 4.
    Enser PGB, Sandom CJ: Towards a comprehensive survey of the semantic gap in visual image retrieval. Proceedings of the 2nd International Conference on Image and Video Retrieval (CIVR '03), July 2003, Urbana-Champaign, Ill, USA 291–299.zbMATHGoogle Scholar
  5. 5.
    Zhao R, Grosky WI: Narrowing the semantic gap—improved text-based web document retrieval using visual features. IEEE Transactions on Multimedia 2002,4(2):189–200. 10.1109/TMM.2002.1017733CrossRefGoogle Scholar
  6. 6.
    Zhao R, Grosky WI: Negotiating the semantic gap: from feature maps to semantic landscapes. Pattern Recognition 2002,35(3):593–600. 10.1016/S0031-3203(01)00062-0zbMATHCrossRefGoogle Scholar
  7. 7.
    Colombo C, Del Bimbo A: Visible image retrieval. In Image Databases: Search and Retrieval of Digital Imagery. Edited by: Castelli V, Bergman LD. John Wiley & Sons, New York, NY, USA; 2002:11–33. chapter 2Google Scholar
  8. 8.
    Leung CHC, Ip HH-S: Benchmarking for content-based visual information search. Proceedings of the 4th International Conference on Advances in Visual Information Systems (VISUAL '00), November 2000, Lyon, France 442–456.Google Scholar
  9. 9.
    Müller H, Müller W, Squire DM: Automated benchmarking in content-based image retrieval. Proceedings of the IEEE International Conference on Multimedia and Expo (ICME '01), August 2001, Tokyo, Japan 290.Google Scholar
  10. 10.
    Carson C, Belongie S, Greenspan H, Malik J: Blobworld: image segmentation using expectation-maximization and its application to image querying. IEEE Transactions on Pattern Analysis and Machine Intelligence 2002,24(8):1026–1038. 10.1109/TPAMI.2002.1023800CrossRefGoogle Scholar
  11. 11.
    Ma W-Y, Manjunath BS: NeTra: a toolbox for navigating large image databases. Multimedia Systems 1999,7(3):184–198. 10.1007/s005300050121CrossRefGoogle Scholar
  12. 12.
    Li Y, Shapiro L: Object recognition for content-based image retrieval.
  13. 13.
    Hoiem D, Sukthankar R, Schneiderman H, Huston L: Object-based image retrieval using the statistical structure of images. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR '04), June-July 2004, Washington, DC, USA 2: 490–497.Google Scholar
  14. 14.
    Tao Y, Grosky WI: Image matching using the OBIR system with feature point histograms. Proceedings of the 4th Working Conference on Visual Database Systems (VDB '98), May 1998, L'Aquila, Italy 192–197.CrossRefGoogle Scholar
  15. 15.
    Itti L, Koch C, Niebur E: A model of saliency-based visual attention for rapid scene analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence 1998,20(11):1254–1259. 10.1109/34.730558CrossRefGoogle Scholar
  16. 16.
    Stentiford FWM: An attention based similarity measure with application to content-based information retrieval. Storage and Retrieval for Media Databases, January 2003, Santa Clara, Calif, USA, Proceedings of SPIE 5021: 221–232.Google Scholar
  17. 17.
    Baeza-Yates R, Ribeiro-Neto B: Modern Information Retrieval. Addison-Wesley/ACM Press, New York, NY, USA; 1999.Google Scholar
  18. 18.
    Chang S-F, Smith JR, Beigi M, Benitez A: Visual information retrieval from large distributed online repositories. Communications of the ACM 1997,40(12):63–71. 10.1145/265563.265573CrossRefGoogle Scholar
  19. 19.
    Palmer S: Vision Science: Photons to Phenomenology. MIT Press, Cambridge, Mass, USA; 1999.Google Scholar
  20. 20.
    Veltkamp R, Tanase M: A survey of content-based image retrieval systems. In Content-Based Image and Video Retrieval. Edited by: Marques O, Furht B. Kluwer Academic, Boston, Mass, USA; 2002:47–101. chapter 5CrossRefGoogle Scholar
  21. 21.
    Chang E, Cheng K-T, Lai W-C, Wu C-T, Chang C, Wu Y-L: PBIR: perception-based image retrieval-a system that can quickly capture subjective image query concepts. Proceedings of the 9th ACM International Conference on Multimedia, September 2001, Ottawa, Canada 611–614.Google Scholar
  22. 22.
    Marques O, Barman N: Semi-automatic semantic annotation of images using machine learning techniques. Proceedings of the 2nd International Semantic Web Conference (ISWC '03), October 2003, Sanibel Island, Fla, USA, Lecture Notes in Computer Science 2870: 550–565.Google Scholar
  23. 23.
    Marques O, Furht B: MUSE: a content-based image search and retrieval system using relevance feedback. Multimedia Tools and Applications 2002,17(1):21–50. 10.1023/A:1014679605305CrossRefGoogle Scholar
  24. 24.
    Oliva A: Gist of a scene. In Neurobiology of Attention. Edited by: Itti L, Rees G, Tsotsos J. Academic Press, Elsevier, New York, NY, USA; 2005:251–256. chapter 41CrossRefGoogle Scholar
  25. 25.
    Styles EA: Attention, Perception, and Memory: An Integrated Introduction. Taylor & Francis Routledge, New York, NY, USA; 2005.Google Scholar
  26. 26.
    Noton D, Stark L: Scanpaths in eye movements during pattern perception. Science 1971,171(968):308–311. 10.1126/science.171.3968.308CrossRefGoogle Scholar
  27. 27.
    Connor C, Egeth H, Yantis S: Visual attention: bottom-up versus top-down. Current Biology 2004,14(19):R850–R852. 10.1016/j.cub.2004.09.041CrossRefGoogle Scholar
  28. 28.
    Santini S, Jain R: The graphical specification of similarity queries. Journal of Visual Languages and Computing 1996,7(4):403–421. 10.1006/jvlc.1996.0021CrossRefGoogle Scholar
  29. 29.
    Pylyshyn ZW: Seeing and Visualizing: It's Not What You Think. MIT Press, Cambridge, Mass, USA; 2006.Google Scholar
  30. 30.
    Palmer S: The effects of contextual scenes on the identification of objects. Memory & Cognition 1975,3(5):519–526. 10.3758/BF03197524CrossRefGoogle Scholar
  31. 31.
    Biederman I: Perceiving real-world scenes. Science 1972,177(43):77–80. 10.1126/science.177.4043.77CrossRefGoogle Scholar
  32. 32.
    Itti L, Koch C: Computational modeling of visual attention. Nature Reviews Neuroscience 2001,2(3):194–203. 10.1038/35058500CrossRefGoogle Scholar
  33. 33.
    Rutishauser U, Walther D, Koch C, Perona P: Is bottom-up attention useful for object recognition? Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR '04), June-July 2004, Washington, DC, USA 2: 37–44.Google Scholar
  34. 34.
    Walther D, Itti L, Riesenhuber M, Poggio T, Koch C: Attentional selection for object recognition—a gentle way. Proceedings of the 2nd International Workshop on Biologically Motivated Computer Vision (BMCV '02), November 2002, Tubingen, Germany, Lecture Notes In Computer Science 2525: 472–479.zbMATHCrossRefGoogle Scholar
  35. 35.
    Navalpakkam V, Itti L: Modeling the influence of task on attention. Vision Research 2005,45(2):205–231. 10.1016/j.visres.2004.07.042CrossRefGoogle Scholar
  36. 36.
    Einhäuser W, König P: Does luminance-contrast contribute to a saliency map for overt visual attention? European Journal of Neuroscience 2003,17(5):1089–1097. 10.1046/j.1460-9568.2003.02508.xCrossRefGoogle Scholar
  37. 37.
    Parkhurst D, Law K, Niebur E: Modeling the role of salience in the allocation of overt visual attention. Vision Research 2002,42(1):107–123. 10.1016/S0042-6989(01)00250-4CrossRefGoogle Scholar
  38. 38.
    Parkhurst D, Niebur E: Texture contrast attracts overt visual attention in natural scenes. European Journal of Neuroscience 2004,19(3):783–789. 10.1111/j.0953-816X.2003.03183.xCrossRefGoogle Scholar
  39. 39.
    Peters RJ, Iyer A, Itti L, Koch C: Components of bottom-up gaze allocation in natural images. Vision Research 2005,45(18):2397–2416. 10.1016/j.visres.2005.03.019CrossRefGoogle Scholar
  40. 40.
    Henderson JM, Brockmole JR, Castelhano MS, Mack M: Image salience versus cognitive control of eye movements in real-world scenes: evidence from visual search. In Eye Movement Research: Insights Into Mind and Brain. Edited by: van Gompel R, Fischer M, Murray W, Hill R. Elsevier, Amsterdam, The Netherlands; in pressGoogle Scholar
  41. 41.
    Itti L, Koch C: Feature combination strategies for saliency-based visual attention systems. Journal of Electronic Imaging 2001,10(1):161–169. 10.1117/1.1333677CrossRefGoogle Scholar
  42. 42.
    Bamidele A, Stentiford FWM, Morphett J: An attention-based approach to content-based image retrieval. BT Technology Journal 2004,22(3):151–160.CrossRefGoogle Scholar
  43. 43.
    Boccignone G, Picariello A, Moscato V, Albanese M: Image similarity based on animate vision: information path matching. Proceedings of the 8th International Workshop on Multimedia Information Systems (MIS '02), October 2002, Tempe, Ariz, USA 66–75.Google Scholar
  44. 44.
    Ballard DH: Animate vision. Artificial Intelligence 1991,48(1):57–86. 10.1016/0004-3702(91)90080-4MathSciNetCrossRefGoogle Scholar
  45. 45.
    Bamidele A, Stentiford FWM: An attention based similarity measure used to identify image clusters. Proceedings of the 2nd European Workshop on the Integration of Knowledge, Semantics and Digital Media Technology, November-December 2005, London, UKGoogle Scholar
  46. 46.
    Machrouh J, Tarroux P: Attentional mechanisms for interactive image exploration. EURASIP Journal on Applied Signal Processing 2005,2005(14):2391–2396. 10.1155/ASP.2005.2391zbMATHGoogle Scholar
  47. 47.
    Bradley AP, Stentiford FWM: JPEG 2000 and region of interest coding. Proceedings of Digital Image Computing: Techniques and Applications (DICTA '02), January 2002, Melbourne, Australia 303–308.Google Scholar
  48. 48.
    Draper B, Baek K, Boody J: Implementing the expert object recognition pathway. Proceedings of the 3rd International Conference on Vision Systems (ICVS '03), April 2003, Graz, AustriaGoogle Scholar
  49. 49.
    Itti L, Gold C, Koch C: Visual attention and target detection in cluttered natural scenes. Optical Engineering 2001,40(9):1784–1793. 10.1117/1.1389063CrossRefGoogle Scholar
  50. 50.
    Itti L, Koch C: A saliency-based search mechanism for overt and covert shifts of visual attention. Vision Research 2000,40(10–12):1489–1506.CrossRefGoogle Scholar
  51. 51.
    Newcombe R: An interactive bottom-up visual attention toolkit in Java.
  52. 52.
    Ma W-Y, Zhang HJ: Benchmarking of image features for content-based retrieval. Proceedings of the 32nd IEEE Conference Record of the Asilomar Conference on Signals, Systems and Computers, November 1998, Pacific Grove, Calif, USA 1: 253–256.Google Scholar
  53. 53.
    Chen Y, Wang JZ, Krovetz R: CLUE: cluster-based retrieval of images by unsupervised learning. IEEE Transactions on Image Processing 2005,14(8):1187–1201.CrossRefGoogle Scholar
  54. 54.
    Kaufman L, Rousseeuw P: Finding Groups in Data: An Introduction to Cluster Analysis. John Wiley & Sons, New York, NY, USA; 1990.zbMATHCrossRefGoogle Scholar

Copyright information

© Marques et al. 2007

Authors and Affiliations

  • Oge Marques
    • 1
    Email author
  • Liam M Mayron
    • 1
  • Gustavo B Borba
    • 2
  • Humberto R Gamba
    • 2
  1. 1.Department of Computer Science and EngineeringFlorida Atlantic UniversityBoca RatonUSA
  2. 2.Programa de Pós-Graduação em Engenharia Elétrica e Informática IndustrialUniversidade Tecnológica Federal do Paraná (UTFPR)CuritibaBrazil

Personalised recommendations