Skip to main content
Log in

Video retrieval using salient foreground region of motion vector based extracted keyframes and spatial pyramid matching

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Despite enormous research efforts devoted by the research community to effectively and precisely perform video matching and retrieval among heterogeneous videos from large-scale video repositories still remains a complex and most challenging task. In order to address this complex challenge, a content based video retrieval technique is required, which can exploit the visual content of the videos for effective retrieval from the videos repositories. In our proposed method, we introduce a computer assisted video retrieval technique which can retrieve the visually similar videos stored in the repositories. To accomplish this task, video summarization based on motion vector is employed to select keyframes based on similar segments. To estimate the video content, salient foreground extraction is executed, and matching based on the spatial pyramid is employed for matching the keyframe features of query video with videos in the repositories. The contribution of the former process has two major sections for superior saliency map generation. Firstly, it heuristically integrates the regional property, contrast, and foreground descriptors together. Secondly, it introduces a new feature vector to characterize the foreground as an object descriptor, while the latter process is the extension of orderless bag-of-features representation, which has significant performance with respect to scene categorization. The video retrieval performance is compared with standard state-of-the-art techniques using real-time datasets. Experimental and usability studies provide satisfactory results for video retrieval based on evaluation metrics such as video sampling error, fidelity, precision, and recall.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13

Similar content being viewed by others

References

  1. Achanta R, Hemami S, Estrada F, Susstrunk S (2009) Frequency-tuned salient region detection. In: IEEE conference on computer vision and pattern recognition. CVPR 2009. IEEE, pp 1597–1604

  2. Albregtsen F, et al. (2008) Statistical texture measures computed from gray level coocurrence matrices. Image Processing Laboratory, Department of Informatics, University of Oslo, p 5

  3. Aote SS, Potnurwar A (2019) An automatic video annotation framework based on two level keyframe extraction mechanism. Multimed Tools Appl 78 (11):14465–14484

    Article  Google Scholar 

  4. Bay H, Tuytelaars T, Van Gool L (2006) Surf: speeded up robust features. In: European conference on computer vision. Springer, pp 404–417

  5. Benuwa B-B, Zhan Y, Monney A, Ghansah B, Ansah EK (2019) Video semantic analysis based kernel locality-sensitive discriminative sparse representation. Exp Syst Appl 119:429–440

    Article  Google Scholar 

  6. Duan L, Xu D, Tsang IW-H, Luo J (2011) Visual event recognition in videos by learning from web data. IEEE Trans Pattern Anal Mach Intell 34(9):1667–1680

    Article  Google Scholar 

  7. Felzenszwalb PF, Girshick RB, McAllester D (2009) Object detection with discriminatively trained part-based models. IEEE Trans Pattern Anal Mach Intell 32(9):1627–1645

    Article  Google Scholar 

  8. Feng Y, Zhou P, Xu J, Ji S, Wu D, Video big data retrieval over media cloud (2018) A context-aware online learning approach. IEEE Trans Multimed 21(7):1762–1777

    Article  Google Scholar 

  9. Gao L, Guo Z, Zhang H, Xu X, Shen HT (2017) Video captioning with attention-based lstm and semantic consistency. IEEE Trans Multimed 19 (9):2045–2055

    Article  Google Scholar 

  10. Gianluigi C, Raimondo S (2006) An innovative algorithm for key frame extraction in video summarization. J Real-Time Image Process 1(1):69–88

    Article  Google Scholar 

  11. Grauman K, Darrell T (2005) The pyramid match kernel: discriminative classification with sets of image features. In: Tenth IEEE international conference on computer vision (ICCV’05) volume 1, vol 2. IEEE, pp 1458–1465

  12. Guo Y, Xu Q, Sun S, Luo X, Sbert M (2016) Selecting video key frames based on relative entropy and the extreme studentized deviate test. Entropy 18 (3):73

    Article  Google Scholar 

  13. Itti L, Braun J, Lee DK, Koch C (1999) Attentional modulation of human pattern discrimination psychophysics reproduced by a quantitative model. In: Advances in neural information processing systems, pp 789–795

  14. Jiang H, Wang J, Yuan Z, Liu T, Zheng N, Li S (2011) Automatic salient object segmentation based on context and shape prior. In: BMVC, vol 6, p 9

  15. Jiang H, Wang J, Yuan Z, Wu Y, Zheng N, Li S (2013) Salient object detection: a discriminative regional feature integration approach. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2083–2090

  16. Kim J, Han D, Tai Y-W, Kim J (2014) Salient region detection via high-dimensional color transform. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 883–890

  17. Klein DA, Frintrop S (2011) Center-surround divergence of feature statistics for salient object detection. In: 2011 IEEE international conference on computer vision (ICCV). IEEE, pp 2214–2219

  18. Kondor R, Jebara T (2003) A kernel between sets of vectors. In: Proceedings of the 20th international conference on machine learning (ICML-03), pp 361–368

  19. Law-To J, Joly A, Boujemaa N (2007) Muscle-vcd-2007: a live benchmark for video copy detection. http://www-rocq.inria.fr/imedia/civr-bench/

  20. Liu J, Luo J, Shah M (2009) Recognizing realistic actions from videos in the wild. Citeseer

  21. Lyu S (2005) Mercer kernels for object recognition with local features. In: 2005 IEEE Computer Society conference on computer vision and pattern recognition (CVPR’05), vol 2. IEEE, pp 223–229

  22. Mallick AK, Mukhopadhyay S (2019) Video retrieval based on motion vector key frame extraction and spatial pyramid matching. In: 2019 6th International conference on signal processing and integrated networks (SPIN). IEEE, pp 687–692

  23. Medioni G, Cohen I, Brémond F, Hongeng S, Nevatia R (2001) Event detection and analysis from video streams. IEEE Trans Pattern Anal Mach Intell 23(8):873–889

    Article  Google Scholar 

  24. Mendi E, Clemente HB, Bayrak C (2013) Sports video summarization based on motion analysis. Comput Electr Eng 39(3):790–796

    Article  Google Scholar 

  25. Perazzi F, Krähenbühl P, Pritch Y, Hornung A (2012) Saliency filters: contrast based filtering for salient region detection. In: 2012 IEEE conference on computer vision and pattern recognition (CVPR). IEEE, pp 733–740

  26. Qiang Z, Xu Q, Sun S, Sbert M (2016) Key frame extraction based on motion vector. In: Pacific rim conference on multimedia. Springer, pp 387–395

  27. Shashua A, Hazan T (2005) Algebraic set kernels with application to inference over local image representations. In: Advances in neural information processing systems, pp 1257–1264

  28. Shen J, Tao D, Li X (2008) Modality mixture projections for semantic video event detection. IEEE Trans Circ Syst Video Technol 18(11):1587–1596

    Article  Google Scholar 

  29. Shyu M-L, Xie Z, Chen M, Chen S-C (2008) Video semantic event/concept detection using a subspace-based multimedia data mining framework. IEEE Trans Multimed 10(2):252–259

    Article  Google Scholar 

  30. Siva P, Russell C, Xiang T, Agapito L (2013) Looking beyond the image: Unsupervised learning for object saliency and detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3238–3245

  31. Spolaôr N, Lee HD, Takaki WSR, Ensina LA, Coy CSR, Wu FC (2020) A systematic review on content-based video retrieval. Eng Appl Artif Intell 90:103557

    Article  Google Scholar 

  32. Su B, Lu S, Tan CL (2011) Blurred image region detection and classification. In: Proceedings of the 19th ACM international conference on multimedia. ACM, pp 1397–1400

  33. Verma SP, Ruiz AQ (2006) Critical values for six dixon tests for outliers in normal samples up to sizes 100, and applications in science and engineering. Rev Mex Cienc Geol 23(2):133–161

    Google Scholar 

  34. Wallraven C, Caputo B, Graf A (2003) Recognition with local features: the kernel recipe. In: null. IEEE, p 257

  35. Wang X, Gao L, Wang P, Sun X, Liu X (2017) Two-stream 3-d convnet fusion for action recognition in videos with arbitrary size and length. IEEE Trans Multimed 20(3):634–644

    Article  Google Scholar 

  36. Warhade KK, Merchant SN (2011) Performance evaluation of shot boundary detection metrics in the presence of object and camera motion. IETE J Res 57(5):461–466

    Article  Google Scholar 

  37. Wu B, Xu L (2014) Integrating bottom-up and top-down visual stimulus for saliency detection in news video. Multimed Tools Appl 73(3):1053–1075

    Article  Google Scholar 

  38. Wu G, Liu L, Guo Y, Ding G, Han J, Shen J, Shao L (2017) Unsupervised deep video hashing with balanced rotation. In: IJCAI

  39. Xu Q, Liu Y, Li X, Yang Z, Wang J, Sbert M, Scopigno R (2014) Browsing and exploration of video sequences: a new scheme for key frame extraction and 3d visualization using entropy based Jensen divergence. Inf Sci 278:736–756

    Article  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ajay Kumar Mallick.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Mallick, A.K., Mukhopadhyay, S. Video retrieval using salient foreground region of motion vector based extracted keyframes and spatial pyramid matching. Multimed Tools Appl 79, 27995–28022 (2020). https://doi.org/10.1007/s11042-020-09312-8

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-020-09312-8

Keywords

Navigation