Skip to main content
Log in

Content based video retrieval system using two stream convolutional neural network

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Nowadays capturing video through mobile phones, digital cameras and uploading it in social media is a trend. These videos do not have semantic tags. Searching these kinds of videos is difficult to web users. Content Based Video Retrieval (CBVR) helps to identify the most relevant videos for a given video query. The objective of the paper is retrieve most relevant videos for a given query video in reduced time. To meet the objective, this paper proposes an efficient video retrieval system using salient object detection and keyframe extraction methods to reduce the high dimensionality of video data. The spatio-temporal features are extracted using two-stream Convolutional Neural Network (CNN) and stored in a feature dataset. The salient objects are used to search the exact subject that is given as query. The relevant videos are identified through similarity matching of feature dataset that are created using the input dataset with the feature of query video. To reduce the complexity of similarity matching, the proposed method replaces feature dataset with classification score dataset. Experiments are conducted on TRECVID and CC_Web_Video datasets and evaluated using precision, recall, specificity, accuracy and f-score. The proposed method is compared with recent methods and proved its efficiency with approximately 99.68% precision rate on TRECVID dataset and 88.9% precision rate on CC_Web_Video dataset. The proposed outperforms most recent methods by 0.001 increase in mean Average Precision (mAP) on CC_Web_Video dataset and 4% increase in precision rate on TRECVID dataset. The computation time is reduced by 100 min on TRECVID and 200 min on CC_Web_Video datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Algorithm 1
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

References

  1. Al-Ayyoub M, AlZu’bi S, Jararweh Y, Shehab MA, Gupta BB (2018) Accelerating 3D medical volume segmentation using GPUs. Multimed Tools Appl 77(4):4939–4958

    Article  Google Scholar 

  2. AlZu’bi S, Shehab M, Al-Ayyoub M, Jararweh Y, Gupta B (2020) Parallel implementation for 3d medical volume fuzzy segmentation. Pattern Recogn Lett 130:312–318

    Article  Google Scholar 

  3. Al-Zu’bi S, Hawashin B, Mughaid A, Baker T (2021) Efficient 3D medical image segmentation algorithm over a secured multimedia network. Multimed Tools Appl 80(11):16887–16905

    Article  Google Scholar 

  4. AlZu'bi S, Al-Qatawneh S, Alsmirat M (2018) Transferable hmm trained matrices for accelerating statistical segmentation time. In: 2018 fifth international conference on social networks analysis, management and security (SNAMS). IEEE, pp 172–176

    Chapter  Google Scholar 

  5. Asha S, Sreeraj M (2013) Content based video retrieval using SURF descriptor. In: Proc. 3rd Int. Conf. Adv. Comput. Commun., pp 212–215

  6. Bian X, Lan R, Wang X, Chen C, Liu Z, Luo X, Lai KK (2021) Discriminative codebook hashing for supervised video retrieval. Comput Intell Neuroscie 2021

  7. Charrière K, Quellec G, Lamard M, Coatrieux G, Cochener B, Cazuguel G (2014) Automated surgical step recognition in normalized cataract surgery videos. In: International conference of the IEEE engineering in medicine and biology society. IEEE, Chicago, pp 4647–4650. https://doi.org/10.1109/EMBC.2014.6944660

    Chapter  Google Scholar 

  8. Charrière K, Quellec G, Lamard M, Martiano D, Cazuguel G, Coatrieux G, Cochener B (2017) Real-time analysis of cataract surgery videos using statistical models. Multimed Tools Appl 76(21):22473–22491. https://doi.org/10.1007/s11042-017-4793-8

    Article  Google Scholar 

  9. Chatfield K, Simonyan K, Vedaldi A, Zisserman A (2014) Return of the devil in the details: delving deep into convolutional nets. In Proc. BMVC.

  10. Cheng H, Wang P, Qi C (2021) CNN retrieval based unsupervised metric learning for near-duplicated video retrieval. arXiv preprint arXiv:2105.14566

  11. Chittajallu DR, Basharat A, Tunison P, Horvath S, Wells KO, Leeds SG, Fleshman JW, Sankaranarayanan G, Enquobahrie A (2019) Content-based retrieval of video segments from minimally invasive surgery videos using deep convolutional video descriptors and iterative query refinement. Med. Imag., Image-Guided Procedures, Robotic Interventions, Model., vol 10951, Art. no. 109512Q

  12. Ding S, Qu S, Xi Y, Wan S (2019) A long video caption generation algorithm for big video data retrieval. Future Gener Comput Syst 93:583–595

    Article  Google Scholar 

  13. Diwakar M, Kumar M (2015) CT image denoising based on complex wavelet transform using local adaptive thresholding and bilateral filtering. In: Proceedings of the third international symposium on women in computing and informatics, pp 297–302

  14. Diwakar M, Kumar M (2018) A review on CT image noise and its denoising. Biomed Signal Process Control 42:73–88

    Article  Google Scholar 

  15. Diwakar M, Kumar P (2019) Wavelet packet based CT image denoising using bilateral method and Bayes shrinkage rule. In: Handbook of multimedia information security: techniques and applications. Springer, Cham, pp 501–511

    Chapter  Google Scholar 

  16. Diwakar M, Kumar P (2020) Blind noise estimation-based CT image denoising in tetrolet domain. Int J Inf Comput Secur 12(2–3):234–252

    Google Scholar 

  17. Diwakar M, Singh P (2020) CT image denoising using multivariate model and its method noise thresholding in non-subsampled shearlet domain. Biomed Signal Process Control 57:101754

    Article  Google Scholar 

  18. Diwakar M, Patel PK, Gupta K, Chauhan C (2013) Object tracking using joint enhanced color-texture histogram. In: 2013 IEEE second international conference on image information processing (ICIIP-2013). IEEE, pp 160–165

    Chapter  Google Scholar 

  19. Diwakar M, Verma A, Lamba S, Gupta H (2019) Inter-and intra-scale dependencies-based CT image denoising in curvelet domain. In: Soft computing: theories and applications. Springer, Singapore, pp 343–350

    Chapter  Google Scholar 

  20. Diwakar M, Kumar P, Singh AK (2020) CT image denoising using NLM and its method noise thresholding. Multimed Tools Appl 79(21):14449–14464

    Article  Google Scholar 

  21. Hawashin B, Alzubi S, Mughaid A, Fotouhi F, Abusukhon A (2020) An efficient cold start solution for recommender systems based on machine learning and user interests. In: 2020 seventh international conference on software defined systems (SDS). IEEE, pp 220–225

    Chapter  Google Scholar 

  22. Jiang B, Huang X, Yang C, Yuan J (2019) SLTFNet: A spatial and language-temporal tensor fusion network for video moment retrieval. Inf Process Manage 56(6):Art. no. 102104

    Article  Google Scholar 

  23. Khan MN, Alam A, Lee YK (2020) FALKON: large-scale content-based video retrieval utilizing deep-features and distributed in-memory computing. In: 2020 IEEE international conference on big data and smart computing (BigComp). IEEE, pp 36–43

    Chapter  Google Scholar 

  24. Kordopatis-Zilos G, Papadopoulos S, Patras I, Kompatsiaris I (2019) Visil: fine-grained spatiotemporal video similarity learning. In: Proceedings of the IEEE international conference on computer vision, pp 6351–6360

  25. Krizhevsky A, Sutskever I, Hinton GE (2012) ImageNet classification with deep convolutional neural networks. In: NIPS, pp 1106–1114

  26. Kumar GN, Reddy V (2019) Key frame extraction using rough set theory for video retrieval. In: Soft computing and signal processing. Springer, pp 751–757

    Chapter  Google Scholar 

  27. Kumar P, Sehgal V, Chauhan DS, Diwakar M (2011) Clouds: concept to optimize the quality of service (QOS) for clusters. In: 2011 world congress on information and communication technologies. IEEE, pp 816–821

    Chapter  Google Scholar 

  28. Kumar V, Tripathi V, Pant B (2019) Learning compact spatio-temporal features for fast content based video retrieval. Int J Innov Technol Exploring Eng 9(2):2402–2409

    Google Scholar 

  29. Lafi M, Hawashin B, AlZu'bi S (2021) Eliciting requirements from stakeholders' responses using natural language processing. Comput Model Eng Sci 127(1):99–116

    Google Scholar 

  30. Liu Y, Sui A (2018) Research on feature dimensionality reduction in content based public cultural video retrieval. In: IEEE/ACIS 17th international conference on computer and information science (ICIS), pp 718–722. https://doi.org/10.1109/ICIS.2018.8466379

    Chapter  Google Scholar 

  31. Mohamadzadeh S, Farsi H (2016) Content based video retrieval based on hdwt and sparse representation. Image Anal Stereol 35(2):67–80

    Article  MathSciNet  MATH  Google Scholar 

  32. Mühling M, Meister M, Korfhage N, Wehling J, Hörth A, Ewerth R, Freisleben B (2016) Content-based video retrieval in historical collections of the german broadcasting archive. In: Fuhr N, Kovács L, Risse T, Nejdl W (eds) International conference on theory and practice of digital libraries. In: lecture notes in computer science, vol 9819. Springer International Publishing, Cham, pp 67–78. https://doi.org/10.1007/978-3-319-43997-6_6

    Chapter  Google Scholar 

  33. Naveen Kumar GS, Reddy VSK (2019) An efficient approach for video retrieval by spatio-temporal features. Int J Knowl-Based Intell Eng Syst 23(4):311–316

    Google Scholar 

  34. Pereira RB, Plastino A, Zadrozny B, Merschmann LHC (2018) Categorizing feature selection methods for multi-label classification. Artif Intell Rev 49(1):57–78. https://doi.org/10.1007/s10462-016-9516-4

    Article  Google Scholar 

  35. Prathiba T, Kumari RSS (2021) Content based video retrieval system based on multimodal feature grouping by KFCM clustering algorithm to promote human-computer interaction. J Ambient Intell Human Comput 12:6215–6229. https://doi.org/10.1007/s12652-020-02190-w

    Article  Google Scholar 

  36. Ram RS, Prakash SA, Balaanand M et al (2020) Colour and orientation of pixel based video retrieval using IHBM similarity measure. Multimed Tools Appl 79:10199–10214. https://doi.org/10.1007/s11042-019-07805-9

    Article  Google Scholar 

  37. Ramezani M, Yaghmaee F (2018a) Motion pattern based representation for improving human action retrieval. Multimed Tools Appl 77(19):26009–26032. https://doi.org/10.1007/s11042-018-5835-6

    Article  Google Scholar 

  38. Rehman SU, Tu S, Huang Y, Rehman OU (2018) A benchmark dataset and learning high-level semantic embeddings of multimedia for crossmedia retrieval. IEEE Access 6:67176–67188

    Article  Google Scholar 

  39. Shao J, Wen X, Zhao B, Wang C, Xue X (2020) Context encoding for video retrieval with contrastive learning. arXiv preprint arXiv:2008.01334

  40. Shao J, Wen X, Zhao B, Xue X (2021) Temporal context aggregation for video retrieval with contrastive learning. In: Proceedings of the IEEE/CVF winter conference on applications of computer vision, pp 3268–3278

  41. Sharma P, Lal N, Diwakar M (2013) Text security using 2d cellular automata rules. In: Conference on advances in communication and control systems (CAC2S 2013). Atlantis Press, pp 363–368

  42. Song J, Zhang H, Li X, Gao L, Wang M, Hong R (2018) Self-supervised video hashing with hierarchical binary auto-encoder. IEEE Trans Image Process 27(7):3210–3221. https://doi.org/10.1109/TIP.2018.2814344

    Article  MathSciNet  MATH  Google Scholar 

  43. Sowmyayani S, Arockia Jansi Rani P (2014) Adaptive GOP structure to H.264/AVC based on scene change. ICTACT J Image Video Process 5(1):868–872

    Article  Google Scholar 

  44. Spolaor N, Lee HD, Takaki WSR, Ensina LA, Coy CSR, Wu FC (2020) A systematic review on content-based video retrieval. Eng Appl Artif Intell 90:103557

    Article  Google Scholar 

  45. Tao JL, Zhang JM, Wang LJ, Shen XJ, Zha ZJ (2019) Near-duplicate video retrieval through Toeplitz Kernel partial least squares. In: Kompatsiaris I, Huet B, Mezaris V, Gurrin C, Cheng WH, Vrochidis S (eds) Multimedia modeling. MMM 2019. Lecture notes in computer science, vol 11296. Springer, Cham. https://doi.org/10.1007/978-3-030-05716-9_29

  46. TRECVID: TREC Video Retrieval Evaluation. http://www-nlpir.nist.gov/projects/trecvid.

  47. Ullah A, Muhammad K, Hussain T, Baik SW, De Albuquerque VHC (2020) Event-oriented 3D convolutional features selection and hash codes generation using PCA for video retrieval. IEEE Access 8:196529–196540

    Article  Google Scholar 

  48. Veltkamp RC, Burkhardt H, Kriegel H-P (2013) State-of-the-art in content-based image and video retrieval. Springer

    MATH  Google Scholar 

  49. Wu X, Ngo CW, Hauptmann AG, Tan H (2009) Real-time near-duplicate elimination for web video search with content and context. IEEE Multimedia 11:196–207

    Article  Google Scholar 

  50. Yu SI, Jiang L, Xu Z, Yang Y, Hauptmann AG (2015) Content-based video search over 1 million videos with 1 core in 1 second. In: ACM on international conference on multimedia retrieval. ACM, New York, pp 419–426. https://doi.org/10.1145/2671188.2749398

    Chapter  Google Scholar 

  51. Zhang H, Wang M, Hong R, Chua T-S (2016) Play and rewind: optimizing binary representations of videos by self-supervised temporal hashing. In: ACM on multimedia conference, pp 781–790. https://doi.org/10.1145/2964284.2964308

    Chapter  Google Scholar 

  52. Zhang J, Sclaroff S, Lin Z, Shen X, Price B, Mech R (2016) Unconstrained salient object detection via proposal subset optimization. In: Computer vision and pattern recognition, pp 5733–5742. https://doi.org/10.1109/CVPR.2016.618

  53. Zhang C, Lin Y, Zhu L, Liu A, Zhang Z, Huang F (2019) CNN-VWII: an efficient approach for large-scale video retrieval by image queries. Pattern Recogn Lett 123:82–88

    Article  Google Scholar 

  54. Zhao G, Zhang M, Li Y, Liu J, Zhang B, Wen JR (2021) Pyramid regional graph representation learning for content-based video retrieval. Inf Process Manag 58(3):102488

    Article  Google Scholar 

  55. Zhu Y, Huang X, Huang Q, Tian Q (2016) Large-scale video copy retrieval with temporal-concentration SIFT. Neurocomputing 187:83–91

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to S. Sowmyayani.

Ethics declarations

Conflict of interest

This work entitled “Content Based Video Retrieval System Using Two Stream Convolutional Neural Network” is not submitted anywhere else. Whole content used in this research is original and not copied. There is no conflict of interest from authors.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Sowmyayani, S., Rani, P.A.J. Content based video retrieval system using two stream convolutional neural network. Multimed Tools Appl 82, 24465–24483 (2023). https://doi.org/10.1007/s11042-023-14784-5

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-023-14784-5

Keywords

Navigation