Skip to main content
Log in

Robust spatio-temporal saliency estimation method for H.264 compressed videos

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

This paper presents a robust spatio-temporal saliency estimation method based on modeling motion vectors and transform residuals extracted from the H.264/AVC compressed bitstream. Spatial saliency is estimated by analyzing the detailed sub-band coefficients obtained by the wavelet decomposition of the luminance component of the macro-blocks, while temporal saliency is estimated by modeling the block motion vector orientation information using local derivative patterns. Dempster Shafer fusion rule is used to fuse the spatial saliency map and the motion saliency map to obtain the final saliency for a video frame. Extensive experimental validation along with comparative analysis with state-of-the-art methods is carried out to establish the proposed saliency method.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

References

  1. Agarwal G, Anbu A, Sinha A (2003) A fast algorithm to find the region-of-interest in the compressed mpeg domain. In: International conference on multimedia and expo, vol 2, pp II–133

  2. Bellitto G, Salanitri FP, Palazzo S, Rundo F, Giordano D, Spampinato C (2020) Video saliency detection with domain adaptation using hierarchical gradient reversal layers. arXiv:2010.01220

  3. Borji A (2021) Saliency prediction in the deep learning era: Successes and limitations. IEEE Trans Pattern Anal Mach Intell 43(2):679–700

    Article  Google Scholar 

  4. Borji A, Itti L (2013) State-of-the-art in visual attention modeling. IEEE Trans Pattern Anal Mach Intell 35(1):185–207

    Article  Google Scholar 

  5. Fontani M, Bianchi T, De Rosa A, Piva A, Barni M (2013) A framework for decision fusion in image forensics based on dempster–shafer theory of evidence. IEEE Trans Inform Forens Secur 8(4):593–607

    Article  Google Scholar 

  6. Goferman S, Manor LZ, Tal A (2012) Context-aware saliency detection. IEEE Trans Pattern Anal Mach Intell 34(10):1915–1926

    Article  Google Scholar 

  7. Hadizadeh H, Bajic IV (2014) Saliency-aware video compression. IEEE Trans Image Process 23(1):19–33

    Article  MathSciNet  Google Scholar 

  8. Hadizadeh H, Enriquez MJ, Bajic IV (2012) Eye-tracking database for a set of standard video sequences. IEEE Trans Image Process 21(2):898–903

    Article  MathSciNet  Google Scholar 

  9. Itti L, Koch C (2001) Computational modelling of visual attention. Nat Rev Neurosci 2(3):194

    Article  Google Scholar 

  10. Khatoonabadi SH, Bajić IV, Shan Y (2015) Compressed-domain correlates of human fixations in dynamic scenes. Multimed Tools Appl 74 (22):10057–10075

    Article  Google Scholar 

  11. Khatoonabadi SH, Bajić IV, Shan Y (2017) Compressed-domain visual saliency models: a comparative study. Multimed Tools Appl 76(24):26297–26328

    Article  Google Scholar 

  12. Khatoonabadi SH, Vasconcelos N, Bajic IV, Shan Y (2015) How many bits does it take for a stimulus to be salient?. In: IEEE conference on computer vision and pattern recognition, pp 5501–5510

  13. Le Meur O, Le Callet P, Barba D, Thoreau D (2006) A coherent computational approach to model bottom-up visual attention. IEEE Trans Pattern Anal Mach Intell 28(5):802–817

    Article  Google Scholar 

  14. Li Y, Lei X, Liang Y, Chen J (2018) Human fixations detection model in video-compressed-domain based on mve and obdl. In: Advanced optical imaging technologies, vol 10816

  15. Li Y, Li S, Chen C, Hao A, Qin H (2020) A plug-and-play scheme to adapt image saliency deep model for video data. arXiv:2008.09103

  16. Li Y, Li Y (2017) A fast and efficient saliency detection model in video compressed-domain for human fixations prediction. Multimed Tools Appl 76(24):26273–26295

    Article  Google Scholar 

  17. Liu T, Yuan Z, Sun J, Wang J, Zheng N, Tang X, Shum H (2011) Learning to detect a salient object. IEEE Trans Pattern Anal Mach Intell 33(2):353–367

    Article  Google Scholar 

  18. Liu W, Li Z, Sun S, Gupta MK, Du H, Malekian R, Sotelo MA, Li W (2021) Design a novel target to improve positioning accuracy of autonomous vehicular navigation system in gps denied environments. IEEE Trans Industr Inform

  19. Liu Y, Han J, Zhang Q, Shan C (2020) Deep salient object detection with contextual information guidance. IEEE Trans Image Process 29:360–374

    Article  MathSciNet  Google Scholar 

  20. Ma Y, Li Z, Malekian R, Zheng S, Sotelo MA (2021) A novel multimode hybrid control method for cooperative driving of an automated vehicle platoon. IEEE Internet Things J 8(7):5822–5838

    Article  Google Scholar 

  21. Ma YF, Zhang HJ (2001) A new perceived motion based shot content representation. In: International conference on image processing, vol 3, pp 426–429

  22. Ouerhani N, Hugli H (2005) Robot self-localization using visual attention. In: International symposium on computational intelligence in robotics and automation, pp 309–314

  23. Peters RJ, Iyer A, Itti L, Koch C (2005) Components of bottom-up gaze allocation in natural images. Vis Res 45(18):2397–2416

    Article  Google Scholar 

  24. Rodriguez MD, Ahmed J, Shah M (2008) Action mach a spatio-temporal maximum average correlation height filter for action recognition. In: IEEE Conference on computer vision and pattern recognition, pp 1–8

  25. Shafer G (1976) A mathematical theory of evidence. Princeton University Press, Princeton

    Book  Google Scholar 

  26. Siagian C, Itti L (2007) Rapid biologically-inspired scene classification using features shared with visual attention. IEEE Trans Pattern Anal Mach Intell 29(2):300–312

    Article  Google Scholar 

  27. Sinha A, Agarwal G, Anbu A (2004) Region-of-interest based compressed domain video transcoding scheme. In: IEEE International conference on acoustics, speech, and signal processing, vol 3, pp iii–161

  28. Sun M, Zhou Z, Hu Q, Wang Z, Jiang J (2019) SG-FCN: A motion and memory-based deep learning model for video saliency detection. IEEE Trans Cybern 49(8):2900–2911

    Article  Google Scholar 

  29. Wiegand T, Sullivan GJ, Bjontegaard G, Luthra A (2003) Overview of the h.264/AVC video coding standard. IEEE Trans Circuits Syst Video Technol 13(7):560–576

    Article  Google Scholar 

  30. Zhang B, Gao Y, Zhao S, Liu J (2010) Local derivative pattern versus local binary pattern: Face recognition with high-order local pattern descriptor. IEEE Trans Image Process 19(2):533–544

    Article  MathSciNet  Google Scholar 

  31. Zhang D, Han J, Han J, Shao L (2016) Cosaliency detection based on intrasaliency prior transfer and deep intersaliency mining. IEEE Trans Neural Netw Learn Syst 27(6):1163–1176

    Article  MathSciNet  Google Scholar 

  32. Zhang Q, Huang N, Yao L, Zhang D, Shan C, Han J (2020) RGB-T salient object detection via fusing multi-level CNN features. IEEE Trans Image Process 29:3321–3335

    Article  Google Scholar 

Download references

Acknowledgements

The authors would like to thank the anonymous reviewers for their valuable feedback which helped us to improve the paper.

Funding

This research work is supported by SERB, Government of India under grant No ECR/2016/000112.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Manish Okade.

Ethics declarations

Conflict of Interests

The authors have no conflicts of interest to declare that are relevant to the content of this article.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Sandula, P., Okade, M. Robust spatio-temporal saliency estimation method for H.264 compressed videos. Multimed Tools Appl 81, 39021–39039 (2022). https://doi.org/10.1007/s11042-022-13148-9

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-022-13148-9

Keywords

Navigation