Abstract
Nowadays capturing video through mobile phones, digital cameras and uploading it in social media is a trend. These videos do not have semantic tags. Searching these kinds of videos is difficult to web users. Content Based Video Retrieval (CBVR) helps to identify the most relevant videos for a given video query. The objective of the paper is retrieve most relevant videos for a given query video in reduced time. To meet the objective, this paper proposes an efficient video retrieval system using salient object detection and keyframe extraction methods to reduce the high dimensionality of video data. The spatio-temporal features are extracted using two-stream Convolutional Neural Network (CNN) and stored in a feature dataset. The salient objects are used to search the exact subject that is given as query. The relevant videos are identified through similarity matching of feature dataset that are created using the input dataset with the feature of query video. To reduce the complexity of similarity matching, the proposed method replaces feature dataset with classification score dataset. Experiments are conducted on TRECVID and CC_Web_Video datasets and evaluated using precision, recall, specificity, accuracy and f-score. The proposed method is compared with recent methods and proved its efficiency with approximately 99.68% precision rate on TRECVID dataset and 88.9% precision rate on CC_Web_Video dataset. The proposed outperforms most recent methods by 0.001 increase in mean Average Precision (mAP) on CC_Web_Video dataset and 4% increase in precision rate on TRECVID dataset. The computation time is reduced by 100 min on TRECVID and 200 min on CC_Web_Video datasets.
Similar content being viewed by others
References
Al-Ayyoub M, AlZu’bi S, Jararweh Y, Shehab MA, Gupta BB (2018) Accelerating 3D medical volume segmentation using GPUs. Multimed Tools Appl 77(4):4939–4958
AlZu’bi S, Shehab M, Al-Ayyoub M, Jararweh Y, Gupta B (2020) Parallel implementation for 3d medical volume fuzzy segmentation. Pattern Recogn Lett 130:312–318
Al-Zu’bi S, Hawashin B, Mughaid A, Baker T (2021) Efficient 3D medical image segmentation algorithm over a secured multimedia network. Multimed Tools Appl 80(11):16887–16905
AlZu'bi S, Al-Qatawneh S, Alsmirat M (2018) Transferable hmm trained matrices for accelerating statistical segmentation time. In: 2018 fifth international conference on social networks analysis, management and security (SNAMS). IEEE, pp 172–176
Asha S, Sreeraj M (2013) Content based video retrieval using SURF descriptor. In: Proc. 3rd Int. Conf. Adv. Comput. Commun., pp 212–215
Bian X, Lan R, Wang X, Chen C, Liu Z, Luo X, Lai KK (2021) Discriminative codebook hashing for supervised video retrieval. Comput Intell Neuroscie 2021
Charrière K, Quellec G, Lamard M, Coatrieux G, Cochener B, Cazuguel G (2014) Automated surgical step recognition in normalized cataract surgery videos. In: International conference of the IEEE engineering in medicine and biology society. IEEE, Chicago, pp 4647–4650. https://doi.org/10.1109/EMBC.2014.6944660
Charrière K, Quellec G, Lamard M, Martiano D, Cazuguel G, Coatrieux G, Cochener B (2017) Real-time analysis of cataract surgery videos using statistical models. Multimed Tools Appl 76(21):22473–22491. https://doi.org/10.1007/s11042-017-4793-8
Chatfield K, Simonyan K, Vedaldi A, Zisserman A (2014) Return of the devil in the details: delving deep into convolutional nets. In Proc. BMVC.
Cheng H, Wang P, Qi C (2021) CNN retrieval based unsupervised metric learning for near-duplicated video retrieval. arXiv preprint arXiv:2105.14566
Chittajallu DR, Basharat A, Tunison P, Horvath S, Wells KO, Leeds SG, Fleshman JW, Sankaranarayanan G, Enquobahrie A (2019) Content-based retrieval of video segments from minimally invasive surgery videos using deep convolutional video descriptors and iterative query refinement. Med. Imag., Image-Guided Procedures, Robotic Interventions, Model., vol 10951, Art. no. 109512Q
Ding S, Qu S, Xi Y, Wan S (2019) A long video caption generation algorithm for big video data retrieval. Future Gener Comput Syst 93:583–595
Diwakar M, Kumar M (2015) CT image denoising based on complex wavelet transform using local adaptive thresholding and bilateral filtering. In: Proceedings of the third international symposium on women in computing and informatics, pp 297–302
Diwakar M, Kumar M (2018) A review on CT image noise and its denoising. Biomed Signal Process Control 42:73–88
Diwakar M, Kumar P (2019) Wavelet packet based CT image denoising using bilateral method and Bayes shrinkage rule. In: Handbook of multimedia information security: techniques and applications. Springer, Cham, pp 501–511
Diwakar M, Kumar P (2020) Blind noise estimation-based CT image denoising in tetrolet domain. Int J Inf Comput Secur 12(2–3):234–252
Diwakar M, Singh P (2020) CT image denoising using multivariate model and its method noise thresholding in non-subsampled shearlet domain. Biomed Signal Process Control 57:101754
Diwakar M, Patel PK, Gupta K, Chauhan C (2013) Object tracking using joint enhanced color-texture histogram. In: 2013 IEEE second international conference on image information processing (ICIIP-2013). IEEE, pp 160–165
Diwakar M, Verma A, Lamba S, Gupta H (2019) Inter-and intra-scale dependencies-based CT image denoising in curvelet domain. In: Soft computing: theories and applications. Springer, Singapore, pp 343–350
Diwakar M, Kumar P, Singh AK (2020) CT image denoising using NLM and its method noise thresholding. Multimed Tools Appl 79(21):14449–14464
Hawashin B, Alzubi S, Mughaid A, Fotouhi F, Abusukhon A (2020) An efficient cold start solution for recommender systems based on machine learning and user interests. In: 2020 seventh international conference on software defined systems (SDS). IEEE, pp 220–225
Jiang B, Huang X, Yang C, Yuan J (2019) SLTFNet: A spatial and language-temporal tensor fusion network for video moment retrieval. Inf Process Manage 56(6):Art. no. 102104
Khan MN, Alam A, Lee YK (2020) FALKON: large-scale content-based video retrieval utilizing deep-features and distributed in-memory computing. In: 2020 IEEE international conference on big data and smart computing (BigComp). IEEE, pp 36–43
Kordopatis-Zilos G, Papadopoulos S, Patras I, Kompatsiaris I (2019) Visil: fine-grained spatiotemporal video similarity learning. In: Proceedings of the IEEE international conference on computer vision, pp 6351–6360
Krizhevsky A, Sutskever I, Hinton GE (2012) ImageNet classification with deep convolutional neural networks. In: NIPS, pp 1106–1114
Kumar GN, Reddy V (2019) Key frame extraction using rough set theory for video retrieval. In: Soft computing and signal processing. Springer, pp 751–757
Kumar P, Sehgal V, Chauhan DS, Diwakar M (2011) Clouds: concept to optimize the quality of service (QOS) for clusters. In: 2011 world congress on information and communication technologies. IEEE, pp 816–821
Kumar V, Tripathi V, Pant B (2019) Learning compact spatio-temporal features for fast content based video retrieval. Int J Innov Technol Exploring Eng 9(2):2402–2409
Lafi M, Hawashin B, AlZu'bi S (2021) Eliciting requirements from stakeholders' responses using natural language processing. Comput Model Eng Sci 127(1):99–116
Liu Y, Sui A (2018) Research on feature dimensionality reduction in content based public cultural video retrieval. In: IEEE/ACIS 17th international conference on computer and information science (ICIS), pp 718–722. https://doi.org/10.1109/ICIS.2018.8466379
Mohamadzadeh S, Farsi H (2016) Content based video retrieval based on hdwt and sparse representation. Image Anal Stereol 35(2):67–80
Mühling M, Meister M, Korfhage N, Wehling J, Hörth A, Ewerth R, Freisleben B (2016) Content-based video retrieval in historical collections of the german broadcasting archive. In: Fuhr N, Kovács L, Risse T, Nejdl W (eds) International conference on theory and practice of digital libraries. In: lecture notes in computer science, vol 9819. Springer International Publishing, Cham, pp 67–78. https://doi.org/10.1007/978-3-319-43997-6_6
Naveen Kumar GS, Reddy VSK (2019) An efficient approach for video retrieval by spatio-temporal features. Int J Knowl-Based Intell Eng Syst 23(4):311–316
Pereira RB, Plastino A, Zadrozny B, Merschmann LHC (2018) Categorizing feature selection methods for multi-label classification. Artif Intell Rev 49(1):57–78. https://doi.org/10.1007/s10462-016-9516-4
Prathiba T, Kumari RSS (2021) Content based video retrieval system based on multimodal feature grouping by KFCM clustering algorithm to promote human-computer interaction. J Ambient Intell Human Comput 12:6215–6229. https://doi.org/10.1007/s12652-020-02190-w
Ram RS, Prakash SA, Balaanand M et al (2020) Colour and orientation of pixel based video retrieval using IHBM similarity measure. Multimed Tools Appl 79:10199–10214. https://doi.org/10.1007/s11042-019-07805-9
Ramezani M, Yaghmaee F (2018a) Motion pattern based representation for improving human action retrieval. Multimed Tools Appl 77(19):26009–26032. https://doi.org/10.1007/s11042-018-5835-6
Rehman SU, Tu S, Huang Y, Rehman OU (2018) A benchmark dataset and learning high-level semantic embeddings of multimedia for crossmedia retrieval. IEEE Access 6:67176–67188
Shao J, Wen X, Zhao B, Wang C, Xue X (2020) Context encoding for video retrieval with contrastive learning. arXiv preprint arXiv:2008.01334
Shao J, Wen X, Zhao B, Xue X (2021) Temporal context aggregation for video retrieval with contrastive learning. In: Proceedings of the IEEE/CVF winter conference on applications of computer vision, pp 3268–3278
Sharma P, Lal N, Diwakar M (2013) Text security using 2d cellular automata rules. In: Conference on advances in communication and control systems (CAC2S 2013). Atlantis Press, pp 363–368
Song J, Zhang H, Li X, Gao L, Wang M, Hong R (2018) Self-supervised video hashing with hierarchical binary auto-encoder. IEEE Trans Image Process 27(7):3210–3221. https://doi.org/10.1109/TIP.2018.2814344
Sowmyayani S, Arockia Jansi Rani P (2014) Adaptive GOP structure to H.264/AVC based on scene change. ICTACT J Image Video Process 5(1):868–872
Spolaor N, Lee HD, Takaki WSR, Ensina LA, Coy CSR, Wu FC (2020) A systematic review on content-based video retrieval. Eng Appl Artif Intell 90:103557
Tao JL, Zhang JM, Wang LJ, Shen XJ, Zha ZJ (2019) Near-duplicate video retrieval through Toeplitz Kernel partial least squares. In: Kompatsiaris I, Huet B, Mezaris V, Gurrin C, Cheng WH, Vrochidis S (eds) Multimedia modeling. MMM 2019. Lecture notes in computer science, vol 11296. Springer, Cham. https://doi.org/10.1007/978-3-030-05716-9_29
TRECVID: TREC Video Retrieval Evaluation. http://www-nlpir.nist.gov/projects/trecvid.
Ullah A, Muhammad K, Hussain T, Baik SW, De Albuquerque VHC (2020) Event-oriented 3D convolutional features selection and hash codes generation using PCA for video retrieval. IEEE Access 8:196529–196540
Veltkamp RC, Burkhardt H, Kriegel H-P (2013) State-of-the-art in content-based image and video retrieval. Springer
Wu X, Ngo CW, Hauptmann AG, Tan H (2009) Real-time near-duplicate elimination for web video search with content and context. IEEE Multimedia 11:196–207
Yu SI, Jiang L, Xu Z, Yang Y, Hauptmann AG (2015) Content-based video search over 1 million videos with 1 core in 1 second. In: ACM on international conference on multimedia retrieval. ACM, New York, pp 419–426. https://doi.org/10.1145/2671188.2749398
Zhang H, Wang M, Hong R, Chua T-S (2016) Play and rewind: optimizing binary representations of videos by self-supervised temporal hashing. In: ACM on multimedia conference, pp 781–790. https://doi.org/10.1145/2964284.2964308
Zhang J, Sclaroff S, Lin Z, Shen X, Price B, Mech R (2016) Unconstrained salient object detection via proposal subset optimization. In: Computer vision and pattern recognition, pp 5733–5742. https://doi.org/10.1109/CVPR.2016.618
Zhang C, Lin Y, Zhu L, Liu A, Zhang Z, Huang F (2019) CNN-VWII: an efficient approach for large-scale video retrieval by image queries. Pattern Recogn Lett 123:82–88
Zhao G, Zhang M, Li Y, Liu J, Zhang B, Wen JR (2021) Pyramid regional graph representation learning for content-based video retrieval. Inf Process Manag 58(3):102488
Zhu Y, Huang X, Huang Q, Tian Q (2016) Large-scale video copy retrieval with temporal-concentration SIFT. Neurocomputing 187:83–91
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
This work entitled “Content Based Video Retrieval System Using Two Stream Convolutional Neural Network” is not submitted anywhere else. Whole content used in this research is original and not copied. There is no conflict of interest from authors.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Sowmyayani, S., Rani, P.A.J. Content based video retrieval system using two stream convolutional neural network. Multimed Tools Appl 82, 24465–24483 (2023). https://doi.org/10.1007/s11042-023-14784-5
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-023-14784-5