A video summarization approach based on the emulation of bottom-up mechanisms of visual attention

Jacob, Hugo; Pádua, Flávio L. C.; Lacerda, Anisio; Pereira, Adriano C. M.

doi:10.1007/s10844-016-0441-4

A video summarization approach based on the emulation of bottom-up mechanisms of visual attention

Published: 10 January 2017

Volume 49, pages 193–211, (2017)
Cite this article

Journal of Intelligent Information Systems Aims and scope Submit manuscript

Hugo Jacob¹,
Flávio L. C. Pádua¹,
Anisio Lacerda¹ &
…
Adriano C. M. Pereira²

636 Accesses
26 Citations
Explore all metrics

Abstract

This work addresses the development of a computational model of visual attention to perform the automatic summarization of digital videos from television archives. Although the television system represents one of the most fascinating media phenomena ever created, we still observe the absence of effective solutions for content-based information retrieval from video recordings of programs produced by this media universe. This fact relates to the high complexity of the content-based video retrieval problem, which involves several challenges, among which we may highlight the usual demand on video summaries to facilitate indexing, browsing and retrieval operations. To achieve this goal, we propose a new computational visual attention model, inspired on the human visual system and based on computer vision methods (face detection, motion estimation and saliency map computation), to estimate static video abstracts, that is, collections of salient images or key frames extracted from the original videos. Experimental results with videos from the Open Video Project show that our approach represents an effective solution to the problem of automatic video summarization, producing video summaries with similar quality to the ground-truth manually created by a group of 50 users.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

A Domain Independent Approach to Video Summarization

Summarization of Videos by Image Quality Assessment

Multi-scale contrast and relative motion-based key frame extraction

Article Open access 05 June 2018

Naveed Ejaz, Sung Wook Baik, … Irfan Mehmood

References

Almeida, J., Leite, N.J., & Torres, R.daS. (2013). Online video summarization on compressed domain. Journal of Visual Communication and Image Representation, 6, 729–738.
Article Google Scholar
Avila, S., Lopes, A., Luz, A., & Albuquerque, A. (2011). VSUMM: A mechanism designed to produce static video summaries and a novel evaluation method. Pattern Recognition Letters, 32, 56–68.
Article Google Scholar
Baber, J., Afzulpurkar, N., & Satoh, S. (2013). A framework for video segmentation using global and local features. International Journal of Pattern Recognition and Artificial Intelligence, 27, 1355007.
Article Google Scholar
Bay, H., Ess, A., Tuytelaars, T., & Van Gool, L. (2008). Speeded-up robust features (SURF). Computer vision and image understanding, 110, 346–359.
Article Google Scholar
Bellman, S., Murphy, J., Treleaven-Hassard, S., O’Farrell, J., Qiu, L., & Varan, D. (2013). Using internet behavior to deliver relevant television commercials. Journal of Interactive Marketing, 27, 130–140.
Article Google Scholar
Benoit, A., Caplier, A., Durette, B., & Hérault, J. (2010). Using human visual system modeling for bio-inspired low level image processing. Computer Vision and Image Understanding, 114, 758–773.
Article Google Scholar
Bhattacharyya, A. (1946). On a measure of divergence between two multinomial populations. Sankhya: The Indian Journal of Statistics, 1, 401–406.
MathSciNet MATH Google Scholar
Borji, A., & Itti, L. (2013). State-of-the-art in visual attention modeling. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35, 185–207.
Article Google Scholar
Carrasco, M. (2011). Visual attention: The past 25 years. Vision Research, 51, 1484–1525.
Article Google Scholar
Cesar, P., & Chorianopoulos, K. (2008). The evolution of TV systems, content, and users toward interactivity. Foundations and Trends in Human-Computer Interaction, 2, 279–373.
Google Scholar
Charaudeau, P. (2002). A communicative conception of discourse. Discourse Studies, 4, 301–318.
Article Google Scholar
Chen, B., Wang, J., & Wang, J. (2009). A novel video summarization based on mining the story-structure and semantic relations among concept entities. IEEE Transactions on Multimedia, 11, 295–312.
Article Google Scholar
Cohen, RA. (2014). Computational approaches to attention, (pp. 891–930). Gainesville: The Neuropsychology of Attention, Springer.
Google Scholar
Conceição, F.L.A., Pádua, F.L.C., Pereira, A.C.M., Assis, G.T., Silva, G.D., & Andrade, A.A.B. (2016). Semiodiscursive Analysis of TV Newscasts Based on Data Mining and Image Processing. Acta Scientiarum Technology (To appear).
Dong, P., Xia, Y., Wang, S., Zhuo, L., & Feng, D. (2015). An iteratively reweighting algorithm for dynamic video summarization. Multimedia Tools and Applications, 74, 9449–9473.
Article Google Scholar
Ejaz, N., Tariq, T., & Baik, S. (2012). Adaptive key frame extraction for video summarization using an aggregation mechanism. Information Systems, 23, 1031–1040.
Google Scholar
Evangelopoulos, G., Zlatintsi, A., Potamianos, A., Maragos, P., Rapantzikos, K., Skoumas, G., & Avrithis, Y. (2013). Multimodal saliency and fusion for movie summarization based on aural, visual, and textual attention. IEEE Transactions on Multimedia, 15, 1553–1568.
Article Google Scholar
Fontaine, G., Borgne-Bachschmidt, L., & Leiba, M. (2010). Scenarios for the internet migration of the television industry. Communications and Strategies, 77, 21–34.
Google Scholar
Geisler, G., Marchionini, G., Wildemuth, B., Hughes, A., Yang, M., Wilkens, T., & Spinks, R. (2002). Video browsing interfaces for the open video project. Proceedings of Human Factors in Computing Systems, 514–515.
Guironnet, M., Pellerin, D., Guyader, N., & Ladret, P. (2007). Video summarization based on camera motion and a subjective evaluation method. EURASIP Journal on Image and Video Processing, 2007, 60245.
Article Google Scholar
Hanjalic, A., & Xu, L. (2005). Affective video content representation and modeling. IEEE Transactions on Multimedia, 7, 143–154.
Article Google Scholar
Hérault, J., & Barthélémy, D. (2007). Modeling visual perception for image processing. In International Work-Conference on Artificial Neural Networks. Berlin: Springer.
Google Scholar
Hutchins, B., & Rowe, D. (2012). Sport beyond television: The internet, digital media and the rise of networked media sport. London: Routledge.
Google Scholar
Kannan, R., Ghinea, G., & Swaminathan, S. (2015). What do you wish to see? a summarization system for movies based on user preferences. Information Processing & Management, 51, 286—305.
Article Google Scholar
Li, Y., Lee, S., Yeh, C., & Kuo, C. (2006). Techniques for movie content analysis and skimming: tutorial and overview on video abstraction techniques. IEEE Signal Processing Magazine, 23, 79–89.
Article Google Scholar
Lowe, D. (2004). Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, 60, 91–110.
Article Google Scholar
Lucas, B., & Kanade, T. (1981). An iterative image registration technique with an application to stereo vision. In Proceedings of Image Understanding Workshop (pp. 121–130).
Ma, Y., & Zhang, H. (2001). A new perceived motion based shot content representation. In Proceedings of IEEE International Conference on Image Processing (pp. 426–429).
Ma, Y., & Zhang, H. (2003). Contrast-based image attention analysis by using fuzzy growing. In Proceedings of ACM International Conference on Multimedia (pp. 374–381).
Ma, Y., Hua, X., Lu, L., & Zhang, H. (2005). A generic framework of user attention model and its application in video summarization. IEEE Transactions on Multimedia, 7, 907–919.
Article Google Scholar
Money, A., & Agius, H. (2008). Video summarisation: a conceptual framework and survey of the state of the art. Journal of Visual Communication and Image Representation, 19, 121–143.
Article Google Scholar
Money, A., & Agius, H. (2009). Analysing user physiological responses for affective video summarisation. Displays, 30, 59–70.
Article Google Scholar
Obrist, M., Bernhaupt, R., & Tscheligi, M. (2008). Interactive TV for the home: an ethnographic study on users’ requirements and experiences. International Journal of Human–Computer Interaction, 24, 174–196.
Article Google Scholar
Owen, B.M. (2009). The Internet challenge to television. Cambridge: Harvard University Press.
Google Scholar
Parkhurst, D., Law, K., & Niebur, E. (2002). Modeling the role of salience in the allocation of overt visual attention. Vision Research, 42, 107–123.
Article Google Scholar
Peng, W., Chu, W., Chang, C., Chou, C., Huang, W., Chang, W., & Hung, Y. (2011). Editing by viewing: automatic home video summarization by viewing behavior analysis. IEEE Transactions on Multimedia, 13, 539–550.
Article Google Scholar
Pereira, M., Souza, C., Pádua, F., Silva, G., Assis, G., & Pereira, A. (2015). SAPTE: A Multimedia information system to support the discourse analysis and information retrieval of television programs. Multimedia Tools and Applications (Dordrecht, Online), 10923–10963.
Pereira, M., Pádua, F.L.C., & Silva, G.D. (2015). Multimodal approach for automatic emotion recognition applied to the tension levels study in TV newscasts. Brazilian Journalism Research, 11, 146–167.
Google Scholar
Rubin, N. (2009). Preserving digital public television: not just an archive, but a new attitude to preserve public broadcasting. Library Trends, 57, 393–412.
Article Google Scholar
Smeaton, A. (2007). Techniques used and open challenges to the analysis, indexing and retrieval of digital video. Information Systems, 32, 545–559.
Article Google Scholar
Smirnakis, S.M., & et al. (1997). Adaptation of retinal processing to image contrast and spatial scale. Nature, 69–73.
Souza, C., Pádua, F., Nunes, C., Assis, G., & Silva, G. (2014). A unified approach to content-based indexing and retrieval of digital videos from television archives. Artificial Intelligence Research, 3, 49–61.
Article Google Scholar
Truong, B., & Venkatesh, S. (2007). Video abstraction: a systematic review and classification. ACM Transactions on Multimedia Computing, Communications, and Applications, 3, 1–37.
Article Google Scholar
Tsai, C., Kang, L., Lin, C., & Lin, W. (2013). Scene-Based Movie summarization via Role-Community networks. IEEE Transactions on Circuits and Systems for Video Technology, 23, 1927–1940.
Article Google Scholar
Van Dijk, T.A. (2013). News as discourse. New York: Routledge.
Google Scholar
Viola, P., & Jones, M. (2004). Robust real-time face detection. International Journal of Computer Vision, 57, 137–154.
Article Google Scholar
Won, W., Yeo, J., Ban, S., & Lee, M. (2007). Biologically motivated incremental object perception based on selective attention. International Journal of Pattern Recognition and Artificial Intelligence, 21, 1293–1305.
Article Google Scholar
Xu, Q., Liu, Y., Li, X., Yang, Z., Wang, J., Sbert, M., & Scopigno, R. (2014). Browsing and exploration of video sequences: a new scheme for key frame extraction and 3d visualization using entropy based jensen divergence. Information Systems, 278, 736–756.
MathSciNet Google Scholar
Zeadally, S., Moustafa, H., & Siddiqui, F. (2011). Internet protocol television (IPTV): Architecture, trends, and challenges. IEEE Systems Journal, 5, 518–527.
Article Google Scholar
Zhang, L., Xia, Y., Mao, K., Ma, H., & Shan, Z. (2015). An effective video summarization framework toward handheld devices. IEEE Transactions on Industrial Electronics, 62, 1309–1316.
Article Google Scholar

Download references

Acknowledgments

The authors gratefully acknowledge the financial support of CNPq under Procs. 468042/2014-8 and 313163/2014-6; FAPEMIG under Procs. APQ-01180-10, APQ-02269-11 and PPM-00542-15; CEFET-MG under Procs. PROPESQ-088/12, PROPESQ-076/09 and PROPESQ-10314/14; and CAPES.

Author information

Authors and Affiliations

Department of Computing, Piim-Lab, CEFET-MG, Av. Amazonas, 7675, Belo Horizonte, Minas Gerais, 30.510-000, Brazil
Hugo Jacob, Flávio L. C. Pádua & Anisio Lacerda
Computer Science Department, UFMG, Av. Antoniô Carlos, 6627, Belo Horizonte, Minas Gerais, 31.270-901, Brazil
Adriano C. M. Pereira

Authors

Hugo Jacob
View author publications
You can also search for this author in PubMed Google Scholar
Flávio L. C. Pádua
View author publications
You can also search for this author in PubMed Google Scholar
Anisio Lacerda
View author publications
You can also search for this author in PubMed Google Scholar
Adriano C. M. Pereira
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Flávio L. C. Pádua.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Jacob, H., Pádua, F.L.C., Lacerda, A. et al. A video summarization approach based on the emulation of bottom-up mechanisms of visual attention. J Intell Inf Syst 49, 193–211 (2017). https://doi.org/10.1007/s10844-016-0441-4

Download citation

Received: 22 August 2016
Revised: 29 November 2016
Accepted: 30 December 2016
Published: 10 January 2017
Issue Date: October 2017
DOI: https://doi.org/10.1007/s10844-016-0441-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

A video summarization approach based on the emulation of bottom-up mechanisms of visual attention

Abstract

Access this article

Similar content being viewed by others

A Domain Independent Approach to Video Summarization

Summarization of Videos by Image Quality Assessment

Multi-scale contrast and relative motion-based key frame extraction

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A video summarization approach based on the emulation of bottom-up mechanisms of visual attention

Abstract

Access this article

Similar content being viewed by others

A Domain Independent Approach to Video Summarization

Summarization of Videos by Image Quality Assessment

Multi-scale contrast and relative motion-based key frame extraction

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation