Skip to main content

Attentive Systems: A Survey

Abstract

Visual saliency analysis detects salient regions/objects that attract human attention in natural scenes. It has attracted intensive research in different fields such as computer vision, computer graphics, and multimedia. While many such computational models exist, the focused study of what and how applications can be beneficial is still lacking. In this article, our ultimate goal is thus to provide a comprehensive review of the applications using saliency cues, the so-called attentive systems. We would like to provide a broad vision about saliency applications and what visual saliency can do. We categorize the vast amount of applications into different areas such as computer vision, computer graphics, and multimedia. Intensively covering 200+ publications we survey (1) key application trends, (2) the role of visual saliency, and (3) the usability of saliency into different tasks.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Notes

  1. https://scholar.google.com.sg/citations?view_op=top_venues&hl=en&vq=eng.

  2. An Italian painter best known for creating imaginative portrait heads made entirely of objects such as fruits, vegetables, flowers, fish, and books.

  3. http://www.google.com/glass.

  4. http://www.eyetracking-glasses.com/.

  5. http://www.tobii.com/en/eye-tracking-research/global/landingpages/tobii-glasses-2/.

References

  • Achanta, R., Hemami, S. S., Estrada, F. J., & Süsstrunk, S. (2009). Frequency-tuned salient region detection. In IEEE conference on computer vision and pattern recognition (pp. 1597–1604).

  • Alexe, B., Deselaers, T., & Ferrari, V. (2012). Measuring the objectness of image windows. IEEE Transactions on Pattern Analysis and Machine Intelligenc, 34(11), 2189–2202.

    Article  Google Scholar 

  • Alkan, S., & Cagiltay, K. (2007). Studying computer game learning experience through eye tracking. BJET, 38(3), 538–542.

    Article  Google Scholar 

  • Avidan, S., & Shamir, A. (2007). Seam carving for content-aware image resizing. ACM Transactions on Graphics, 26(3), 10.

    Article  Google Scholar 

  • Bahdanau, D., Cho, K., & Bengio, Y. (2014). Neural machine translation by jointly learning to align and translate. CoRR (abs/1409.0473).

  • Bailey, R., McNamara, A., Sudarsanam, N., & Grimm, C. (2009). Subtle gaze direction. ACM Transactions on Graphics, 28(4), 100.

    Article  Google Scholar 

  • Baluja, S., & Pomerleau, D. A. (1997). Expectation-based selective attention for visual monitoring and control of a robot vehicle. Robotics and Autonomous Systems, 22(3), 329–344.

    MATH  Article  Google Scholar 

  • Batra, D., Kowdle, A., Parikh, D., Luo, J., & Chen, T.(2010). icoseg: Interactive co-segmentation with intelligent scribble guidance. In IEEE conference on computer vision and pattern recognition (pp. 3169–3176).

  • Belardinelli, A. (2008). Salience features selection: Deriving a model from human evidence. Ph.D. thesis, Sapienza Universita di Roma, Rome, Italy.

  • Bhattacharya, S., Sukthankar, R., & Shah, M. (2010). A framework for photo-quality assessment and enhancement based on visual aesthetics. In ACM multimedia conference (pp. 271–280).

  • Borji, A., Cheng, M., Jiang, H., & Li, J. (2015). Salient object detection: A benchmark. IEEE Transactions on Image Processing, 24(12), 5706–5722.

    MathSciNet  Article  Google Scholar 

  • Borji, A., Frintrop, S., Sihite, D. N., & Itti, L. (2012). Adaptive object tracking by learning background context. In IEEE conference on computer vision and pattern recognition workshops (pp. 23–30).

  • Borji, A. & Itti, L. (2011). Scene classification with a sparse set of salient regions. In IEEE international conference on robotics and automation (pp. 1902–1908).

  • Borji, A., & Itti, L. (2013). State-of-the-art in visual attention modeling. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(1), 185–207.

    Article  Google Scholar 

  • Borji, A., Sihite, D. N., & Itti, L. (2012). Salient object detection: A benchmark. In European conference on computer vision (pp. 414–429).

  • Boykov, Y., & Kolmogorov, V. (2004). An experimental comparison of min-cut/max-flow algorithms for energy minimization in vision. IEEE Transactions on Pattern Analysis and Machine Intelligence, 26, 1124–1137.

    MATH  Article  Google Scholar 

  • Breazeal, C., & Scassellati, B. (1999). A context-dependent attention system for a social robot. In International joint conference on artificial intelligence (pp. 1146–1153).

  • Bruce, N., & Tsotsos, J. (2005). Saliency based on information maximization. In Advances in neural information processing systems.

  • Butko, N., & Movellan, J. (2009). Optimal scanning for faster object detection. In IEEE conference on computer vision and pattern recognition (pp. 2751–2758).

  • Chamaret, C., & Le Meur, O. (2008). Attention-based video reframing: Validation using eye-tracking. In International conference on pattern recognition (pp. 1–4).

  • Chen, J., & Ji, Q. (2011). Probabilistic gaze estimation without active personal calibration. In IEEE conference on computer vision and pattern recognition, CVPR (pp. 609–616).

  • Chen, J., & Ji, Q. (2015). A probabilistic approach to online eye gaze tracking without explicit personal calibration. IEEE Transactions on Image Processing, 24(3), 1076–1086.

    MathSciNet  Article  Google Scholar 

  • Chen, Q., Song, Z., Hua, Y., Huang, Z., & Yan, S. (2012). Hierarchical matching with side information for image classification. In IEEE conference on computer vision and pattern recognition (pp. 3426–3433).

  • Chen, T., Cheng, M.-M., Tan, P., Shamir, A., & Hu, S.-M. (2009). Sketch2photo: Internet image montage. ACM Transactions on Graphics, 28, 124.

    Google Scholar 

  • Chen, Y., Nguyen, T., Kankanhalli, M. S., Yuan, J., Yan, S., & Wang, M. (2014). Audio matters in visual attention. IEEE Transactions on Circuits and Systems for Video Technology, 24(11), 1992–2003.

    Article  Google Scholar 

  • Cheng, M., Mitra, N. J., Huang, X., Torr, P. H. S., & Hu, S. (2015). Global contrast based salient region detection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 37(3), 569–582.

    Article  Google Scholar 

  • Cheng, M.-M., Zhang, Z., Lin, W.-Y., & Torr, P. H. S. (2014). BING: Binarized normed gradients for objectness estimation at 300 fps. In IEEE conference on computer vision and pattern recognition (pp. 3286–3293).

  • Chia, A., Zhuo, S., Gupta, R. K., Tai, Y.-W., Cho, S., Tan, P., et al. (2011). Semantic colorization with internet images. ACM Transactions on Graphics, 30, 1–7.

    Article  Google Scholar 

  • Choi, J., Ahn, B., Park, J., & Kweon, I. (2014). Gmm-based saliency aggregation for calibration-free gaze estimation. In IEEE international conference on image processing (pp. 1096–1099).

  • Choi, J., Oh, T., & Kweon, I. (2016). Human attention estimation for natural images: An automatic gaze refinement approach. CoRR (bs/1601.02852).

  • Courty, N., & Marchand, E. (2003). Visual perception based on salient features. In International conference on intelligent robots and systems (Vol. 1, pp. 1024–1029).

  • Dalal, N., & Triggs, B. (2005). Histograms of oriented gradients for human detection. In IEEE conference on computer vision and computer vision (pp. 886–893).

  • Dankers, A., Barnes, N., & Zelinsky, A. (2007). A reactive vision system: Active-dynamic saliency. In International conference on computer vision systems.

  • DeCarlo, D., & Santella, A. (2002). Stylization and abstraction of photographs. ACM Transactions on Graphics, 21(3), 769–776.

    Article  Google Scholar 

  • Desingh, K. Krishna, K. M., Rajan, D., & Jawahar, C.(2013). Depth really matters: Improving visual salient region detection with depth. In British machine vision conference.

  • Donoser, M., Urschler, M., Hirzer, M., & Bischof, H. (2009). Saliency driven total variation segmentation. In IEEE 12th international conference on computer vision (pp. 817–824).

  • Drewes, H., Luca, A. D., & Schmidt, A. (2007). Eye-gaze interaction for mobile phones. In Proceedings of international conference on mobile technology, applications, and systems (pp. 364–371).

  • Ehinger, K., Hidalgo-Sotelo, B., Torralba, A., & Oliva, A. (2009). Modeling search for people in 900 scenes. Visual Cognition, 17, 945–978.

  • El-Nasr, M. S., Vasilakos, A., Rao, C., & Zupko, J. (2009). Dynamic intelligent lighting for directing visual attention in interactive 3-d scenes. IEEE Transactions on Computational Intelligence and AI in Games, 1(2), 145–153.

    Article  Google Scholar 

  • Elazary, L., & Itti, L. (2008). Interesting objects are visually salient. Journal of Vision, 8(3), 3–3.

    Article  Google Scholar 

  • Erdem, E., & Erdem, A. (2013). Visual saliency estimation by nonlinearly integrating features using region covariances. Journal of Vision, 13(4), 1–20.

    Article  Google Scholar 

  • Everingham, M., Gool, L. J. V., Williams, C. K. I., Winn, J. M., & Zisserman, A. (2010). The pascal visual object classes (VOC) challenge. International Journal of Computer Vision, 88(2), 303–338.

    Article  Google Scholar 

  • Feng, S., Xu, D., & Yang, X. (2010). Attention-driven salient edge (s) and region (s) extraction with application to cbir. Signal Processing, 90(1), 1–15.

    MATH  Article  Google Scholar 

  • Frintrop, S. (2006). VOCUS: A visual attention system for object detection and goal-directed search (Vol. 3899).

  • Frintrop, S. (2011). Towards attentive robots. Paladyn, 2(2), 64–70.

    Google Scholar 

  • Frintrop, S., Garcia, G. M., & Cremers, A. B. (2014). A cognitive approach for object discovery. In International conference on pattern recognition (pp. 2329–2334).

  • Frintrop, S., & Jensfelt, P. (2008). Attentional landmarks and active gaze control for visual SLAM. IEEE Transactions on Robotics, 24(5), 1054–1065.

    Article  Google Scholar 

  • Frintrop, S., & Kessel, M. (2009). Most salient region tracking. In IEEE international conference on robotics and automation (pp. 1869–1874).

  • Frintrop, S., Königs, A., Hoeller, F., & Schulz, D. (2010). A component-based approach to visual person tracking from a mobile platform. International Journal of Social Robotics, 2(1), 53–62.

    Article  Google Scholar 

  • Fritz, G., Seifert, C., Paletta, L., & Bischof, H. (2004). Attentive object detection using an information theoretic saliency measure. In International workshop on attention and performance in computational vision (pp. 29–41).

  • Gadde, R. & Karlapalem, K. (2011). Aesthetic guideline driven photography by robots. In International joint conference on artificial intelligence (Vol. 22, pp. 2060).

  • Gao, D., Han, S., & Vasconcelos, N. (2009). Discriminant saliency, the detection of suspicious coincidences, and applications to visual recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 31(6), 989–1005.

    Article  Google Scholar 

  • Gao, D. & Vasconcelos, N. (2004). Discriminant saliency for visual recognition from cluttered scenes. In Advances in neural information processing systems (pp. 481–488).

  • Gao, Y., Shi, M., Tao, D., & Xu, C. (2015). Database saliency for fast image retrieval. IEEE Transactions on Multimedia, 17(3), 359–369.

    Article  Google Scholar 

  • Gautier, J., Le Meur, O., & Guillemot, C. (2012). Efficient depth map compression based on lossless edge coding and diffusion. In Picture coding symposium (pp. 81–84).

  • Girshick, R. B. (2015). Fast R-CNN. In IEEE international conference on computer vision, ICCV (pp. 1440–1448).

  • Goferman, S., Tal, A., & Zelnik-Manor, L. (2010). Puzzle-like collage. Computer Graphics Forum, 29, 459–468.

    Article  Google Scholar 

  • Goferman, S., Zelnik-Manor, L., & Tal, A. (2010). Context-aware saliency detection. In IEEE conference on computer vision and pattern recognition (pp. 2376–2383).

  • Goldberg, C., Chen, T., Zhang, F., Shamir, A., & Hu, S. (2012). Data-driven object manipulation in images. Computer Graphics Forum, 31, 265–274.

    Article  Google Scholar 

  • Graves, A. (2013). Generating sequences with recurrent neural networks. CoRR (abs/1308.0850).

  • Gupta, R., Khanna, M. T., & Chaudhury, S. (2013). Visual saliency guided video compression algorithm. Signal Processing: Image Communication, 28(9), 1006–1022.

    Google Scholar 

  • Han, S., & Vasconcelos, N. (2010). Biologically plausible saliency mechanisms improve feedforward object recognition. Vision Research, 50(22), 2295–2307.

    Article  Google Scholar 

  • Haque, A., Alahi, A., & Fei-Fei, L.(2016). Recurrent attention models for depth-based person identification. In IEEE conference on computer vision and pattern recognition (pp. 1229–1238).

  • Harel, J., Koch, C., & Perona, P. (2006). Graph-based visual saliency. In Advances in neural information processing systems (pp. 545–552).

  • Heidemann, G., Rae, R., Bekel, H., Bax, I., & Ritter, H. (2004). Integrating context-free and context-dependent attentional mechanisms for gestural object reference. Machine Vision and Applications, 16(1), 64–73.

    MATH  Article  Google Scholar 

  • Hong, B., & Brady, M. (2003). A topographic representation for mammogram segmentation. In Medical image computing and computer-assisted intervention (pp. 730–737).

  • Hong, R., Wang, M., Xu, M., Yan, S., & Chua, T. (2010). Dynamic captioning: Video accessibility enhancement for hearing impairment. In ACM multimedia.

  • Hou, X., & Zhang, L. (2007). Saliency detection: A spectral residual approach. In IEEE conference on computer vision and pattern recognition.

  • Hou, X., & Zhang, L. (2008). Dynamic visual attention: Searching for coding length increments. Advances in Neural Information Processing Systems, 21, 681–688.

    Google Scholar 

  • Huang, H., Zhang, L., & Zhang, H.-C. (2011). Arcimboldo-like collage using internet images. ACM Transactions on Graphics, 30, 1–7.

    Google Scholar 

  • iLab, C., (2010). Neuromorphic vision. Toolkit.

  • Ishiguro, Y., Mujibiya, A., Miyaki, T., & Rekimoto, J. (2010). Aided eyes: Eye activity sensing for daily life. In Proceedings of augmented human international conference (p. 25).

  • Itti, L. (2004). Automatic foveation for video compression using a neurobiological model of visual attention. IEEE Transactions on Image Processing, 13(10), 1304–1318.

    Article  Google Scholar 

  • Itti, L., Koch, C., & Niebur, E. (1998). A model of saliency-based visual attention for rapid scene analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence, 20(11), 1254–1259.

    Article  Google Scholar 

  • Jacobson, N., Lee, Y., Mahadevan, V., Vasconcelos, N., & Nguyen, T. Q. (2010). A novel approach to fruc using discriminant saliency and frame segmentation. IEEE Transactions on Image Processing, 19(11), 2924–2934.

    MathSciNet  MATH  Article  Google Scholar 

  • Ji, Q., Fang, Z., Xie, Z., & Lu, Z. (2013). Video abstraction based on the visual attention model and online clustering. Signal Processing: Image Communication, 28(3), 241–253.

    Google Scholar 

  • Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., Guadarrama, S., & Darrell, T. (2014). Caffe: Convolutional architecture for fast feature embedding. In ACM multimedia (pp. 675–678).

  • Jiang, H., Wang, J., Yuan, Z., Wu, Y., Zheng, N., & Li, S. (2013). Salient object detection: A discriminative regional feature integration approach. In IEEE conference on computer vision and pattern recognition (pp. 2083–2090).

  • Jiang, M., Huang, S., Duan, J., & Zhao, Q. (2015a). Mouse saliency-a new method for low-cost large-scale attentional data collection. Journal of Vision, 15(12), 221–221.

    Article  Google Scholar 

  • Jiang, M., Huang, S., Duan, J., & Zhao, Q. (2015b). SALICON: Saliency in context. In IEEE conference on computer vision and pattern recognition.

  • Johnson-Roberson, M., Bohg, J., Björkman, M., & Kragic, D. (2010). Attention-based active 3d point cloud segmentation. In International conference on intelligent robots and systems (pp. 1165–1170).

  • Kadir, T., & Brady, M. (2001). Saliency, scale and image description. International Journal of Computer Vision, 45(2), 83–105.

    MATH  Article  Google Scholar 

  • Kanan, C., & Cottrell, G. W. (2010). Robust classification of objects, faces, and flowers using natural image statistics. In IEEE conference on computer vision and pattern recognition (pp. 2472–2479).

  • Karpathy, A., Miller, S., & Fei-Fei, L. (2013). Object discovery in 3d scenes via shape analysis. In IEEE international conference on robotics and automation (pp. 2088–2095).

  • Kim, J., Han, D., Tai, Y., & Kim, J. (2014). Salient region detection via high-dimensional color transform. In IEEE conference on computer vision and pattern recognition (pp. 883–890).

  • Kläser, A., Marszalek, M., & Schmid, C. (2008). A spatio-temporal descriptor based on 3D-gradients. In British machine vision conference.

  • Klein, D. A., Schulz, D., Frintrop, S., & Cremers, A. B. (2010). Adaptive real-time video-tracking for arbitrary objects. In International conference on intelligent robots and systems (pp. 772–777).

  • Koch, C., & Ullman,S. (1985). Shifts in selective visual attention: Towards the underlying neural circuitry. Human Neurobiology, 4, 219–227.

  • Kolmogorov, V., & Zabih,R. (2004). What energy functions can be minimized via graph cuts? volume 26, pages 147–159.

  • Krähenbühl, P., & Koltun,V. (2014). Geodesic object proposals. In European conference on computer vision (pp. 725–739).

  • Lance, B., & Marsella, S. (2010). The expressive gaze model: Using gaze to express emotion. IEEE Computer Graphics and Applications, 30(4), 62–73.

    Article  Google Scholar 

  • Lance, B., Marsella, S., & Koizumi, D. (2004). Towards expressive gaze manner in embodied virtual agents. In AAMAS workshop on empathic agents New-York.

  • Lang, C., Nguyen, T., Katti, H., Yadati, K., Kankanhalli,M.S., & Yan,S. (2012). Depth matters: Influence of depth cues on visual saliency. In European conference on computer vision (pp. 101–115).

  • Laptev, I., Marszalek, M., Schmid, C., & Rozenfeld,B. (2008). Learning realistic human actions from movies. In IEEE conference on computer vision and pattern recognition.

  • Lazebnik, S., Schmid, C., & Ponce,J. (2006). Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In Computer vision and pattern recognition (pp. 2169–2178).

  • Le Meur, O., Le Callet, P., Barba, D., & Thoreau, D. (2006). A coherent computational approach to model bottom-up visual attention. IEEE Transactions on Pattern Analysis and Machine Intelligence, 28(5), 802–817.

    Article  Google Scholar 

  • Lee, C. H., Varshney, A., & Jacobs, D. W. (2005). Mesh saliency. ACM Transactions on Graphics, 24, 659–666.

    Article  Google Scholar 

  • Lee, Y. J. , Ghosh, J., & Grauman, K. (2012). Discovering important people and objects for egocentric video summarization. In IEEE conference on computer vision and pattern recognition (Vol. 2, p. 6).

  • Li, A., She, X., & Sun,Q. (2013). Color image quality assessment combining saliency and fsim. In International conference on digital image processing.

  • Li, H., & Ngan, K. N. (2008). Saliency model-based face segmentation and tracking in head-and-shoulder video sequences. Journal of Visual Communication and Image Representation, 19(5), 320–333.

    Article  Google Scholar 

  • Li, L., Jiang, S., Zha, Z.-J., Wu, Z., & Huang, Q. (2013). Partial-duplicate image retrieval via saliency-guided visual matching. IEEE MultiMedia, 20(3), 13–23.

    Article  Google Scholar 

  • Li, L., Mei, T., & Hua, X.-S. (2010a). Gamesense: Game-like in-image advertising. Multimedia Tools and Applications, 49(1), 145–166.

    Article  Google Scholar 

  • Li, L., Mei, T., Hua, X.-S., & Li, S. (2008). Imagesense. In ACM multimedia (pp. 1027–1028).

  • Li, L., Mei, T., Niu, X., & Ngo, C.-W. (2010b). Pagesense: Style-wise web page advertising. In International conference on world wide web (pp. 1273–1276).

  • Li, Q., Zhou, Y., & Yang, J. (2011). Saliency based image segmentation. In International conference on multimedia technology (pp. 5068–5071).

  • Li, Y., Hou, X., Koch, C., Rehg, J. M., & Yuille, A. L. (2014). The secrets of salient object segmentation. In IEEE conference on computer vision and pattern recognition (pp. 280–287).

  • Liu, H., & Heynderickx, I. (2009). Studying the added value of visual attention in objective image quality metrics based on eye movement data. In IEEE international conference on image processing (pp. 3097–3100).

  • Liu, H., Jiang, S., Huang, Q., & Xu, C. (2008). A generic virtual content insertion system based on visual attention analysis. In ACM multimedia (pp. 379–388).

  • Liu, T., Yuan, Z., Sun, J., Wang, J., Zheng, N., Tang, X., et al. (2011). Learning to detect a salient object. IEEE Transactions on Pattern Analysis and Machine Intelligence, 33(2), 353–367.

    Article  Google Scholar 

  • Lowe, D. G. (1999). Object recognition from local scale-invariant features. In IEEE international conference on computer vision (pp. 1150–1157).

  • Luebke, D. (2016). Towards foveated rendering for gaze-tracked virtual reality. ACM Transactions on Graphics, 35(6), 179:1–179:12.

  • Luong, M.-T., Pham, H., & Manning, C. D. (2015). Effective approaches to attention-based neural machine translation. arXiv preprint arXiv:1508.04025.

  • Ma, Y.-F., Hua, X.-S., Lu, L., & Zhang, H.-J. (2005). A generic framework of user attention model and its application in video summarization. IEEE Transactions on Multimedia, 7(5), 907–919.

    Article  Google Scholar 

  • Mahadevan, V., & Vasconcelos, N. (2009). Saliency-based discriminant tracking. In IEEE conference on computer vision and pattern recognition (pp. 1007–1013).

  • Maki, A., Nordlund, P., & Eklundh, J. (2000). Attentional scene segmentation: Integrating depth and motion. Computer Vision and Image Understanding, 78(3), 351–373.

    Article  Google Scholar 

  • Marchesotti, L., Cifarelli, C., & Csurka, G. (2009). A framework for visual saliency detection with applications to image thumbnailing. In IEEE conference on computer vision (pp. 2232–2239).

  • Margolin, R., Zelnik, L., & Tal, A. (2013). Saliency for image manipulation. The Visual Computer, 29(5), 381–392.

    Article  Google Scholar 

  • Martín-Martín, A., Ayllón, J. M., Orduña-Malea, E., & López-Cózar, E. D. (2014). Google scholar metrics 2014: A low cost bibliometric tool. arXiv preprint arXiv:1407.2827.

  • Mateescu, V. A., & Bajić, I. V. (2014). Attention retargeting by color manipulation in images. In International workshop on perception inspired video processing (pp. 15–20).

  • Mathe, S., & Sminchisescu, C., (2012). Dynamic eye movement datasets and learnt saliency models for visual action recognition. In European conference computer vision (pp. 842–856).

  • Mathe, S., & Sminchisescu, C., (2013). Actions in the eye: Dynamic gaze datasets and learnt saliency models for visual recognition. CoRR (abs/1312.7570).

  • Meger, D., Forssén, P.-E., Lai, K., Helmer, S., McCann, S., Southey, T., et al. (2008). Curious george: An attentive semantic robot. Robotics and Autonomous Systems, 56(6), 503–511.

    Article  Google Scholar 

  • Mei, T., Li, L., Hua, X.-S., & Li, S. (2012). Imagesense: Towards contextual image advertising. ACM Transactions on Multimedia Computing, Communications, and Applications, 8(1), 6.

    Article  Google Scholar 

  • Mertsching, B., Bollmann, M., Massad, A., & Schmalz, S., (1998). Recognition of complex objects with an active vision system. In Symposium on neural computation (pp. 469–475).

  • Mishra, A. K., Aloimonos, Y., Cheong, L. F., & Kassim, A. A. (2012). Active visual segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 34(4), 639–653.

    Article  Google Scholar 

  • Mitri, S., Frintrop, S., Pervölz, K., Surmann, H., & Nüchter, A., (2005). Robust object detection at regions of interest with an application in ball recognition. In IEEE international conference on robotics and automation (pp. 125–130).

  • Mnih, V., Heess, N., Graves, A., & Kavukcuoglu, K., (2014). Recurrent models of visual attention. In Advances in neural information processing systems (pp. 2204–2212).

  • Moosmann, F., Larlus, D., & Jurie, F., (2006). Learning saliency maps for object categorization. In ECCV workshop on the representation and use of prior knowledge in vision.

  • Muhl, C., Nagai, Y., & Sagerer, G., (2007). On constructing a communicative space in hri. In KI 2007: Advances in artificial intelligence (pp. 264–278).

  • Muratov, O., Dang, T., Boato, G., & De Natale, F., (2012). Saliency detection as a support for image forensics. In International symposium on communications control and signal processing (pp. 1–5).

  • Murray, N., Vanrell, M., Otazu, X., & Párraga, C. A. (2011). Saliency estimation using a non-parametric low-level vision model. In IEEE conference on computer vision and pattern recognition (pp. 433–440).

  • Nagai, Y. (2009). From bottom-up visual attention to robot action learning. In International conference on development and learning (pp. 1–6).

  • Navalpakkam, V., & Itti, L. (2006). An integrated model of top-down and bottom-up attention for optimizing detection speed. In 2006 IEEE conference on computer vision and pattern recognition (pp. 2049–2056).

  • Nguyen, P., Fleureau, J., Chamaret, C., & Guillotel, P. (2013). Calibration-free gaze tracking using particle filter. In IEEE international conference on multimedia and expo (pp. 1–6).

  • Nguyen, T., Li, L., Tan, J., & Yan, S. (2012). 3DME: 3d media express from rgb-d images. In ACM multimedia (pp. 1331–1332).

  • Nguyen, T., Ni, B., Liu, H., Xia, W., Luo, J., Kankanhalli, M., et al. (2013). Image re-attentionizing. IEEE Transactions on Multimedia, 15(8), 1910–1919.

    Article  Google Scholar 

  • Nguyen, T., Song, Z., & Yan, S. (2015). STAP: spatial-temporal attention-aware pooling for action recognition. IEEE Transactions on Circuits and Systems for Video Technology, 25(1), 77–86.

    Article  Google Scholar 

  • Nguyen, T., Xu, M., Gao, G., Kankanhalli, M. S., Tian, Q., & Yan, S. (2013). Static saliency vs. dynamic saliency: A comparative study. In ACM multimedia (pp. 987–996).

  • Nguyen, T. V. (2015). Salient object detection via objectness proposals. In Proceedings of AAAI conference on artificial intelligence (pp. 4286–4287).

  • Nguyen, T. V., & Liu, L. (2017). Salient object detection with semantic priors. In International joint conference on artificial intelligence.

  • Nguyen, T. V., & Sepulveda, J. (2015). Salient object detection via augmented hypotheses. In International joint conference on artificial intelligence (pp. 2176–2182).

  • Ni, B., Xu, M., Nguyen, T., Wang, M., Lang, C., Huang, Z., et al. (2014). Touch saliency: Characteristics and prediction. IEEE Transactions on Multimedia, 16(6), 1779–1791.

    Article  Google Scholar 

  • Ninassi, A., Le Meur, O., Le Callet, P., & Barbba, D. (2007). Does where you gaze on an image affect your perception of quality? applying visual attention to image quality metric. In IEEE international conference on image processing (Vol. 2, pp. 2–169).

  • Oliva, A., & Torralba, A. (2001). Modeling the shape of the scene: A holistic representation of the spatial envelope. International Journal of Computer Vision, 42(3), 145–175.

    MATH  Article  Google Scholar 

  • Ouerhani, N., Bracamonte, J., Hugli, H., Ansorge, M., & Pellandini, F. (2001). Adaptive color image compression based on visual attention. In International conference on image analysis and processing (pp. 416–421).

  • Ouerhani, N., Bur, A., & Hügli, H. (2005). Visual attention-based robot self-localization. In European conference on mobile robots (pp. 8–13).

  • Papadopoulos, D. P., Clarke, A. D., Keller, F., & Ferrari, V. (2014). Training object class detectors from eye tracking data. In European conference on computer vision (pp. 361–376).

  • Parikh, N., Itti, L., & Weiland, J. (2010). Saliency-based image processing for retinal prostheses. Journal of Neural Engineering, 7(1), 1–10.

    Article  Google Scholar 

  • Perazzi, F., Krähenbühl, P., Pritch, Y., & Hornung, A. (2012). Saliency filters: Contrast based filtering for salient region detection. In IEEE conference on computer vision and pattern recognition (pp. 733–740).

  • Perra, D., Gupta, R. K., & Frahm, J. (2015). Adaptive eye-camera calibration for head-worn devices. In IEEE conference on computer vision and pattern recognition (pp. 4146–4155).

  • Qin, C., Zhang, G., Zhou, Y., Tao, W., & Cao, Z. (2014). Integration of the saliency-based seed extraction and random walks for image segmentation. Neurocomputing, 129, 378–391.

    Article  Google Scholar 

  • Queiroz, R. B., Barros, L. M., & Musse, S. R. (2007). Automatic generation of expressive gaze in virtual animated characters: From artists craft to a behavioral animation model. In Intelligent virtual agents, 7th international conference (pp. 401–402).

  • Ren, S., He, K., Girshick, R. B., & Sun, J. (2017). Faster R-CNN: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39(6), 1137–1149.

    Article  Google Scholar 

  • Ren, Z., Gao, S., Chia, L., & Tsang, I. W. (2014). Region-based saliency detection and its application in object recognition. IEEE Transactions on Circuits and Systems for Video Technology, 24(5), 769–779.

    Article  Google Scholar 

  • Riesenhuber, M., & Poggio, T. (1999). Hierarchical models of object recognition in cortex. Nature Neuroscience, 2, 1019–1025.

    Article  Google Scholar 

  • Roberts, R., Ta, D.-N., Straub, J., Ok, K., & Dellaert, F. (2012). Saliency detection and model-based tracking: A two part vision system for small robot navigation in forested environment. In SPIE defense, security, and sensing (pp. 83870S–83870S).

  • Rosenholtz, R., Dorai, A., & Freeman, R. (2011). Do predictions of visual perception aid design? ACM Transactions on Applied Perception, 8(2), 12.

    Article  Google Scholar 

  • Rother, C., Kolmogorov, V., & Blake, A. (2004). Grabcut: Interactive foreground extraction using iterated graph cuts. ACM Transactions on Graphics, 23(3), 309–314.

    Article  Google Scholar 

  • Rutishauser, U., Walther, D., Koch, C., & Perona, P. (2004). Is bottom-up attention useful for object recognition? In IEEE conference on computer vision and pattern recognition (pp. 37–44).

  • Sadaka, N., & Karam, L. (2009). Efficient perceptual attentive super-resolution. In IEEE international conference on image processing (pp. 3113–3116).

  • Salah, A., Alpaydin, E., & Akarun, L. (2002). A selective attention-based method for visual pattern recognition with application to handwritten digit recognition and face recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 24(3), 420–425.

    Article  Google Scholar 

  • Scheier, C., & Egner, S. (1997). Visual attention in a mobile robot. In IEEE international symposium on industrial electronics (Vol. 1, pp. 48–52).

  • Schneider, W., & Shiffrin, R. M. (1977). Controlled and automatic human information processing: I. Detection, search, and attention. Psychological Review, 84(1), 1.

    Article  Google Scholar 

  • Setlur, V., Takagi, S., Raskar, R., Gleicher, M., & Gooch, B. (2005). Automatic image retargeting. In International conference on mobile and ubiquitous multimedia (pp. 59–68).

  • Shen, C., & Zhao, Q. (2014). Webpage saliency. In European conference on computer vision (pp. 33–46).

  • Shen, H., Li, S., Zhu, C., Chang, H., & Zhang, J. (2013). Moving object detection in aerial video based on spatiotemporal saliency. Chinese Journal of Aeronautics, 26(5), 1211–1217.

    Article  Google Scholar 

  • Shiffrin, R., & Schneider, W. (1977). Controlled and automatic human information processing: II. Perceptual learning, automatic attending and a general theory. Psychological Review, 84(2), 127.

    Article  Google Scholar 

  • Siagian, C., & Itti, L. (2007). Rapid biologically-inspired scene classification using features shared with visual attention. IEEE Transactions on Pattern Analysis and Machine Intelligence, 29(2), 300–312.

    Article  Google Scholar 

  • Siagian, C., & Itti, L. (2009). Biologically inspired mobile robot vision localization. IEEE Transactions on Robotics, 25(4), 861–873.

    Article  Google Scholar 

  • Simoncelli, E. (1996). Foundations of vision .

  • Srivatsa, R. S., & Babu, R. V. (2015). Salient object detection via objectness measure. In International conference on image processing (pp. 4481–4485).

  • Stalder, S., Grabner, H., & Gool, L. J. V. (2012). Dynamic objectness for adaptive tracking. In Asian conference on computer vision (pp. 43–56).

  • Stentiford, F. (2003). Attention-based image similarity measure with application to content-based information retrieval. In Electronic imaging (pp. 221–232).

  • Sugano, Y., Matsushita, Y., & Sato, Y. (2010). Calibration-free gaze sensing using saliency maps. In IEEE conference on computer vision and pattern recognition (pp. 2667–2674).

  • Suh, B., Ling, H., Bederson, B. B., & Jacobs, D. W. (2003). Automatic thumbnail cropping and its effectiveness. In ACM symposium on user interface software and technology (pp. 95–104).

  • Tanaka, R., Narumi, T., Tanikawa, T., & Hirose, M. (2015). Attracting user’s attention in spherical image by angular shift of virtual camera direction. In ACM symposium on spatial user interaction (pp. 61–64).

  • Tatler, B., Hayhoe, M., Land, M., & Ballard, D. (2011). Eye guidance in natural vision: Reinterpreting salience. Journal of Vision, 11(5), 5.

    Article  Google Scholar 

  • Vig, E., Dorr, M., & Cox, D. D. (2012). Space-variant descriptor sampling for action recognition based on saliency and eye movements. In European conference on computer vision (pp. 84–97).

  • Vijayakumar, S., Conradt, J., Shibata, T., & Schaal, S. (2001). Overt visual attention for a humanoid robot. In Proceedings 2001 IEEE/RSJ international conference on intelligent robots and systems (Vol. 4, pp. 2332–2337).

  • Viola, P., & Jones, M. (2004). Robust real-time face detection. International Journal of Computer Vision, 57(2), 137–154.

    Article  Google Scholar 

  • Walther, D., & Koch, C. (2006). Modeling attention to salient proto-objects. Neural Networks, 19(9), 1395–1407.

    MATH  Article  Google Scholar 

  • Wang, H., Kläser, A., Schmid, C., & Liu, C. (2011). Action recognition by dense trajectories. In IEEE conference on computer vision and pattern recognition (pp. 3169–3176).

  • Wang, H., & Schmid, C. (2013). Action Recognition with Improved Trajectories. In IEEE international conference on computer vision (pp. 3551–3558).

  • Wang, H., Ullah, M., Kläser, A., Laptev, I., & Schmid, C. (2009). Evaluation of local spatio-temporal features for action recognition. In British machine vision conference.

  • Wang, J., Quan, L., Sun, J., Tang, X., & Shum, H.-Y. (2006). Picture collage. In IEEE conference on computer vision and pattern recognition (Vol. 1, pp. 347–354).

  • Wolfe, J. M. (1994). Guided search 2.0 a revised model of visual search. Psychonomic Bulletin & Review, 1(2), 202–238.

    Article  Google Scholar 

  • Wong, L., & Low, K. (2011). Saliency retargeting: An approach to enhance image aesthetics. In IEEE workshop on applications of computer vision (pp. 73–80).

  • Wong, L.-K., & Low, K.-L. (2009). Saliency-enhanced image aesthetics class prediction. In IEEE international conference on image processing (pp. 997–1000).

  • Wong, L.-K., & Wong, K.-L. (2012). Enhancing visual dominance by semantics-preserving image recomposition. In ACM multimedia (pp. 845–848).

  • Xu, J., Mukherjee, L., Li, Y., Warner, J., Rehg, J. M., & Singh, V. (June 2015). Gaze-enabled egocentric video summarization via constrained submodular maximization. In IEEE conference on computer vision and pattern recognition.

  • Xu, K., Ba, J., Kiros, R., Cho, K., Courville, A. C., Salakhutdinov, R., Zemel, R. S., & Bengio, Y. (2015). Show, attend and tell: Neural image caption generation with visual attention. In International conference on machine learning (pp. 2048–2057).

  • Xu, K., Chen, K., Fu, H., Sun, W.-L., & Hu, S.-M. (2013). Sketch2scene: Sketch-based co-retrieval and co-placement of 3d models. ACM Transactions on Graphics, 32(4), 1–12.

    Article  Google Scholar 

  • Yan, Q., Xu, L., Shi, J., & Jia, J. (2013). Hierarchical saliency detection. In IEEE conference on computer vision and pattern recognition (pp. 1155–1162).

  • Yang, Z., He, X., Gao, J., Deng, L., & Smola, A. J. (2016). Stacked attention networks for image question answering. In IEEE conference on computer vision and pattern recognition (pp. 21–29).

  • Yun, K., Peng, Y., Samaras, D., Zelinsky, G. J., & Berg, T. L. (2013). Studying relationships between human gaze, description, and computer vision. In IEEE conference on computer vision and pattern recognition (pp. 739–746).

  • Zhai, Y., & Shah, M. (2006). Visual attention detection in video sequences using spatiotemporal cues. In ACM international conference on multimedia (pp. 815–824).

  • Zhang, G., Yuan, Z., Zheng, N., Sheng, X., & Liu, T. (2009). Visual saliency based object tracking. In Asian conference on computer vision (pp. 193–203).

  • Zhang, G.-X., Cheng, M.-M., Hu, S.-M., & Martin, R. R. (2009). A shape-preserving approach to image resizing. Computer Graphics Forum, 28, 1897–1906.

    Article  Google Scholar 

  • Zhang, J., & Sclaroff, S. (2013). Saliency detection: A boolean map approach. In IEEE international conference on computer vision (pp. 153–160).

  • Zhang, L., Tong, M. H., Marks, T. K., Shan, H., & Cottrell, G. W. (2008). Sun: A bayesian framework for saliency using natural statistics. Journal of Vision, 8(7), 32.

    Article  Google Scholar 

  • Zhang, X., Sugano, Y., Fritz, M., & Bulling, A. (2015). Appearance-based gaze estimation in the wild. In IEEE conference on computer vision and pattern recognition (pp. 4511–4520).

  • Zhao, R., Ouyang, W., & Wang, X. (2013a). Person re-identification by salience matching. In IEEE international conference on computer vision (pp. 2528–2535).

  • Zhao, R., Ouyang, W., & Wang, X. (2013b). Unsupervised salience learning for person re-identification. In IEEE conference on computer vision and pattern recognition (pp. 3586–3593).

  • Zhao, R., Ouyang, W., & Wang, X. (2015). Person re-identification by saliency learning. CoRR (abs/1412.1908).

  • Zhou, B., Khosla, A., Lapedriza, A., Oliva, A., & Torralba, A. (2014). Object detectors emerge in deep scene cnns. In International conference on learning representations.

  • Zhou, B., Lapedriza, À., Xiao, J., Torralba, A., & Oliva, A. (2014). Learning deep features for scene recognition using places database. In Advances in neural information processing systems (pp. 487–495).

  • Zitnick, C. L., & Dollár, P. (2014). Edge boxes: Locating object proposals from edges. In European conference on computer vision (pp. 391–405).

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Tam V. Nguyen.

Additional information

Communicated by Yoichi Sato.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Nguyen, T.V., Zhao, Q. & Yan, S. Attentive Systems: A Survey. Int J Comput Vis 126, 86–110 (2018). https://doi.org/10.1007/s11263-017-1042-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11263-017-1042-6

Keywords

  • Attentive systems
  • Visual attention
  • Salient region/object detection
  • Interestingness
  • Scene understanding
  • Computer graphics
  • Image retargeting
  • Feature pooling
  • Multimedia
  • Compression