Skip to main content
Log in

A study on video semantics; overview, challenges, and applications

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Due to the increase in surveillance systems, there is a massive increase in surveillance data. As of now, the key challenge for video surveillance systems is analyzing these large video clips. It, therefore, has an enormous demand for intelligent video analysis systems capable of identifying activities and events. Since many researchers have emphasized the role of contextual knowledge and how the performance of video content analysis has improved in several ways, we have looked at different approaches in this study that can extract semantic information to human-level perception in the video. We also addressed open problems in semantics that come from event detection and irregular activity detection. Most methods/models are too coarse to accurately extract a complete set of information. Thus, we need to use a machine-readable format to view, process, store and extract meaningful information from the video data. In this paper, we discussed the methods/approaches for extracting low-level features, mid-level features, and high-level video features and their representation using Semantic Technologies. A taxonomy of hierarchical feature generation approaches is also provided. Some evaluation metrics for evaluating video activity and measuring the performance of the extraction features are explored. Community-approved benchmark datasets are also thoroughly surveyed and presented. The paper provides a complete framework of video research to develop an intelligent surveillance system.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Notes

  1. https://www.youtube.com/

  2. https://wordnet.princeton.edu/

References

  1. Aafaq N, Mian A, Liu W, Gilani SZ, Shah M (2019) Video description: a survey of methods, datasets, and evaluation metrics. ACM Comput Surv (CSUR) 52(6):1–37

  2. Ahmed SA, Dogra DP, Kar S, Roy PP (2018) Trajectory-based surveillance analysis: a survey. In: IEEE Transactions on Circuits and Systems for Video Technology 29(7):1985–1997

    Google Scholar 

  3. Ahsan U, Sun C, Hays J, Essa I (2017) Complex event recognition from images with few training examples, In: Proc. of IEEE Winter Conf. Appl. Comput. Vision, WACV 2017, pp. 669–678

  4. Akdemir U, Turaga P, Chellappa R (2008) An ontology based approach for activity recognition from video. In: ACM international conference on Multimedia, pp. 709–712

  5. Ali H, Sharif M, Yasmin M et al (2020) A survey of feature extraction and fusion of deep learning for detection of abnormalities in video endoscopy of gastrointestinal-tract. Artif Intell Rev 53:2635–2707

    Article  Google Scholar 

  6. Aljaloud AS, Ullah H (2021) IA-SSLM: Irregularity-Aware Semi-Supervised Deep Learning Model for Analyzing Unusual Events in Crowds. IEEE Access 9:73327–73334

  7. Anjulan A, Canagarajah N (2009) A unified framework for object retrieval and mining. IEEE Trans Circuits Syst Video Technol 19(1):63–76

    Google Scholar 

  8. AR Z, MS Khurram Soomro (2012) UCF101: A dataset of 101 human action classes from videos in the wild

  9. Arbeláez P, Pont-Tuset J, Barron JT, Marques F, Malik J (2014) Multiscale combinatorial grouping. In: IEEE conference on computer vision and pattern recognition, pp. 328–335

  10. Arroyo R, Yebes JJ, Bergasa LM, Daza IG, Almazán J (2015) Expert video-surveillance system for real-time detection of suspicious behaviors in shopping malls. Expert Syst Appl

  11. Babenko A, Slesarev A, Chigorin A, Lempitsky V (2014) Neural codes for image retrieval, in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

  12. Bai L, Lao S, Jones GJF, Smeaton AF (2007) Video semantic content analysis based on ontology, in International Machine Vision and Image Processing Conference, IMVIP 2007, 2007

  13. Baradel F, Wolf C, Mille J, Taylor GW (2018) Glimpse Clouds: Human Activity Recognition from Unstructured Feature Points. In: Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit., pp. 469–478

  14. Bell S, Zitnick CL, Bala K, Girshick R (2016) Inside-Outside Net: Detecting Objects in Context with Skip Pooling and Recurrent Neural Networks.In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition

  15. Bellamine I, Tairi H (2014) Motion detection using the space-time interest points. J Comput Sci 10(5), 828

  16. Bellamine I, Tairi H, (2015) Motion detection using color structure-texture image decomposition. In: Intell. Comput. Vision, ISCV, Syst, p 2015

  17. Ben Mabrouk A, Zagrouba E (2017) Spatio-temporal feature using optical flow based distribution for violence detection, Pattern Recognit. Lett., vol. 92, pp. 62–67

  18. Ben Mabrouk A, Zagrouba E (2018) Abnormal behavior recognition for intelligent video surveillance systems: a review. Expert Syst Appl 91:480–491

    Google Scholar 

  19. Bermejo Nievas E, Deniz Suarez O, Bueno García G, Sukthankar R (2011) Violence detection in video using computer vision techniques. In: Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

  20. Bernardin K, Stiefelhagen R (2008) Evaluating multiple object tracking performance: The CLEAR MOT metrics. Eurasip J Image Video Process

  21. Bewley A, Ge Z, Ott L, Ramos F,Upcroft B (2016) Simple online and realtime tracking, Proc. - Int. Conf. Image Process. ICIP, vol. 2016-Augus, pp. 3464–3468

  22. Bhattacharya S, Kalayeh MM, Sukthankar R, Shah M (2014) Recognition of complex events: Exploiting temporal dynamics between underlying concepts. In: IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit., pp. 2243–2250

  23. Bizer C, Heath T, Berners-Lee T (2011) Linked data: The story so far. In Semantic services, interoperability and web applications: emerging concepts (pp. 205–227). IGI Global

  24. Bottazzi E, Ferrario R (2009) Preliminaries to a DOLCE ontology of organisations. Int J Bus Process Integr Manag 4(4):225–238

    Google Scholar 

  25. Bouindour S, Hittawe MM, Mahfouz S, Snoussi H (2018) Abnormal Event Detection Using Convolutional Neural Networks and 1-Class SVM classifier, pp. 1–6

  26. Burl MC (2004) Mining Patterns of Activity from Video Data, In: SIAM Int. Conf. Data Min., pp. 532–536

  27. Cao Z, Simon T, Wei SE, Sheikh Y (2017) Realtime multi-person 2D pose estimation using part affinity fields, In: 30th IEEE Conference on Computer Vision and Pattern Recognition

  28. Carreira J, Zisserman A, Vadis Q (2017) action recognition? A new model and the kinetics dataset. In: Proc. IEEE Conf. Comput. Vis. Pattern Recognition, CVPR 2017, vol. 2017-Janua, pp. 4724–4733

  29. Caruccio L, Polese G, Tortora G, Iannone D (2019) EDCAR: A knowledge representation framework to enhance automatic video surveillance. Expert Syst Appl

  30. Cavaliere D, Senatore S, Vento M, Loia V (2016) Towards semantic context-Aware drones for aerial scenes understanding. In: IEEE Int. Conf. Adv. Video Signal Based Surveillance, AVSS 2016, no. August, pp. 115–121

  31. Cong Y, Yuan J, Liu J (2013) Abnormal event detection in crowded scenes using sparse representation. In: Pattern Recognit 46(7):1851–1864

    Google Scholar 

  32. Chen L, Nugent C (2009) Ontology-based activity recognition in intelligent pervasive environments. Int J Web Inf Syst

  33. Chen K, Zhang D, Yao L, Guo B, Yu Z, Liu Y (2021) Deep Learning for Sensor-based Human Activity Recognition: Overview, Challenges, and Opportunities. ACM Comput Surv (CSUR) 54(4):1–40

    Google Scholar 

  34. Choudhary A, Chaudhury S, Banerjee S (2008) A framework for analysis of surveillance videos. In: 2008 Sixth Indian Conf. Comput. Vision, Graph. Image Process., pp 344–351

  35. Cisco Visual Networking Index: Forecast and Methodology (2016–2021). In: Cisco Public White Pap, pp. 2016–2021

  36. Cortes C, Vapnik V, Support-Vector Cortes C, Vapnik V (1995) Support-vector networks. Mach Learn 20(3):273–297

  37. Crowley JL, Reignier P, Pesnel S (2005) CAVIAR Context Aware Vision using Image-based Active Recognition

  38. Cutler R, Davis LS (2000) Robust real-time periodic motion detection, analysis, and applications. IEEE Trans. Pattern Anal. Mach. Intell. 22(8):781–796

    Google Scholar 

  39. Dai J, Li Y, He K, Sun J (2016) R-FCN: object detection via region-based fully convolutional networks. In: Proceedings of the 30th International Conference on Neural Information Processing Systems (NIPS’16). Curran Associates Inc., Red Hook, NY, USA, 379–387

  40. Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. In: 2005 IEEE computer society conference on computer vision and pattern recognition (CVPR’05). San Diego, vol 1, pp 886–893

  41. Dendorfer P, Rezatofighi H, Milan A, Shi J, Cremers D, Reid I, Roth S, Schindler K, Leal-Taixé L (2020) MOT20: A benchmark for multi object tracking in crowded scenes. arXiv:2003.09003[cs], (arXiv: 2003.09003)

  42. Deng J, Dong W, Socher R, Li L-J, Li K, Fei-Fei L (2009) ImageNet: A large-scale hierarchical image database. In: IEEE Conference on Computer Vision and Pattern Recognition

  43. Dhiman C, Vishwakarma DK (2019) A review of state-of-the-art techniques for abnormal human activity recognition. Eng Appl Artif Intell 1(77):21–45

    Google Scholar 

  44. Du M, Yuan X (2021) A survey of competitive sports data visualization and visual analysis. J Vis 24(1):47–67

    Google Scholar 

  45. Duong TH, Nguyen NT, Truong HB, Nguyen VH (2015) A collaborative algorithm for semantic video annotation using a consensus-based social network analysis. Expert Syst Appl 42(1):246–258

    Google Scholar 

  46. Elleuch N, Zarka M, Ben Ammar A, Alimi MA (2011) A fuzzy ontology: based framework for reasoning in visual video content analysis and indexing. In: Proc. Elev. Int. Work. Multimed. Data Min., p. 1

  47. Erhan D, Szegedy C, Toshev A, Anguelov D (2014) Scalable Object Detection Using Deep Neural Networks In: IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, pp. 2155–2162

  48. Everingham M, Van Gool L, Williams CKI, Winn J, Zisserman A (2010) The pascal visual object classes (VOC) challenge. Int J Comput Vis 88(2):303–338

    Google Scholar 

  49. Everingham M, Eslami SMA, Van Gool L, Williams CKI, Winn J, Zisserman A (2015) Int J Comput Vis 111(1):98–136

    Google Scholar 

  50. Fan J, Zhu X, Hacid MS, Elmagarmid AK (2002) Model-based video classification toward hierarchical representation, indexing and access. Multimed Tools Appl 17(1):97–120

    Google Scholar 

  51. Fan J, Luo H, Gao Y, Jain R (2007) Incorporating concept ontology for hierarchical video classification, annotation, and visualization. IEEE Trans. Multimed. 9(5):939–957

    Google Scholar 

  52. Felzenszwalb PF, Society IC, Girshick RB, Member S, Mcallester D, Ramanan D (2010) Object detection with discriminatively trained part-based models. IEEE Trans Pattern Anal Mach Intell 32(9):1627–1645

    Google Scholar 

  53. Feng W, Zhihao H, Wei W, Junjie Y, Wanli O (2019) Multi-object tracking with multiple cues and switcher-aware classification. arXiv preprint arXiv:1901.06129

  54. Ferryman J (2006) PETS 2006 Benchmark Data, In: Conjunction with IEEE Conference on Computer Vision and Pattern Recognition 2006 New York, USA - 18 June 2006. [Online]. Available: http://www.cvg.reading.ac.uk/PETS2006/data.html

  55. Freund Y (1997) Schapire RE. A Decision-Theoretic Generalization of On-Line Learning and an Application to Boosting 139:119–139

    Google Scholar 

  56. Fiaz M, Mahmood A, Jung SK (2018) Tracking noisy targets: A review of recent object tracking approaches. arXiv preprint arXiv:1802.03098

  57. Fu CFC, Li GLG, Dai KDK (2005) A framework for video structure mining. In: 2005 Int. Conf. Mach. Learn. Cybern., vol 3, no August, pp 1524–1528

  58. Fu CY, Liu W, Ranga A, Tyagi A, Berg AC, Dssd: Deconvolutional single shot detector, arXiv preprint arXiv:1701.06659. 2017 Jan 23

  59. G A, A B, K C, Y L, J F, A G, A D, J Z, E G, L D, AF S, Y G, W K, Quénot G (2019) An evaluation campaign to benchmark Video Activity Detection. Video Captioning and Matching, and Video Search & retrieval, in Proceedings of TRECVID 2019

  60. Gan C, Wang N, Yang Y, Yeung DY, Hauptmann AG (2015) DevNet: A Deep Event Network for multimedia event detection and evidence recounting. In: Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit., vol. 07-12-June, pp. 2568–2577

  61. Gao Y, Liu H, Sun X, Wang C, Liu Y (2016) Violence detection using Oriented VIolent Flows. Image Vis Comput 48-49:37-41

  62. García A, Bescós J, Video object segmentation based on feedback schemes guided by a low-level scene ontology. In: Proceedings of the 10th international conference on advanced concepts for intelligent vision systems, Springer, Berlin, ACIVS ’08, pp 322–333

  63. Garcia-Garcia A, Orts-Escolano S, Oprea S, Villena-Martinez V, Martinez-Gonzalez P, Garcia-Rodriguez J (2018) A survey on deep learning techniques for image and video semantic segmentation. Appl Soft Comput 70:41–65

    Google Scholar 

  64. Géczy P, Izumi N, Akaho S, Hasida K (2008) Advances in data mining. Medical Applications, E-Commerce, Marketing, and Theoretical Aspects, vol 5077

  65. Girshick R (2015) Fast R-CNN, In: IEEE International Conference on Computer Vision (ICCV), Santiago, pp. 1440–1448

  66. Girshick R, Donahue J, Darrell T, Berkeley UC (2012) J. Malik, Rich feature hierarchies for accurate object detection and semantic segmentation, pp 2–9

    Google Scholar 

  67. Girshick R (2015) Fast R-CNN. In: IEEE Int. Conf. Comput. Vis. pp. 1440–1448

  68. Girshick R, Donahue J, Darrell T, Malik J (2016) R-CNN: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587

  69. Gömez-Romero J, Patricio MA, García J, Molina JM (2011) Ontology-based context representation and reasoning for object tracking and scene interpretation in video. Expert Syst Appl 38(6):7494–7510

    Google Scholar 

  70. Grassi M, Morbidoni C, Nucci M (2012) A Collaborative Video Annotation System Based on Semantic Web Technologies. Cognit Comput 4(4):497–514

    Google Scholar 

  71. Greco L, Ritrovato P, Saggese A, Vento M (2016) Abnormal Event Recognition: A Hybrid Approach Using SemanticWeb Technologies, In: IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit. Work 1:1297–1304

  72. Greco L, Ritrovato P, Saggese A, Vento M (2016b) Improving reliability of people tracking by adding semantic reasoning. In: IEEE international conference on advanced video and signal based surveillance (AVSS), pp 194–199

  73. Gruber TR (1995) Toward principles for the design of ontologies used for knowledge sharing. Int J Hum Comput Stud

  74. Guntuboina C, Porwal A, Jain P, Shingrakhia H (2021) Deep Learning Based Automated Sports Video Summarization using YOLO. Electronic Letters on Computer Vision and Image Analysis 20(1):99–116

    Google Scholar 

  75. Hamid R, Maddi S, Bobick A, Essa I (2007) Structure from statistics - Unsupervised activity analysis using suffix trees, In: Proc. IEEE Int. Conf. Comput. Vis

  76. Harikrishna N, Satheesh S, Sriram SD, Easwarakumar KS (2011) Temporal classification of events in cricket videos. In: 2011 Natl. Conf. Commun. NCC 2011, pp 14–18

  77. Hassan MM, Ullah S, Hossain MS, Alelaiwi A (2021) An end-to-end deep learning model for human activity recognition from highly sparse body sensor data in internet of medical things environment. The Journal of Supercomputing 77:2237–2250

    Google Scholar 

  78. Hauptmann A, Yan R, Lin WH, Christel M, Wactlar H (2007) Can high-level concepts fill the semantic gap in video retrieval? A case study with broadcast news. IEEE Trans. Multimed. 9(5):958–966

    Google Scholar 

  79. He K, Zhang X, Ren S, Sun J (2015) SppNet. IEEE Trans Pattern Anal Mach Intell

  80. He K, Zhang X, Ren S, Sun J (2015) Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition, in IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 37, no. 9, pp. 1904-1916

  81. He K, Zhang X, Ren S, Sun J (2016) ResNet. In: Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit

  82. He K, Gkioxari G, Dollár P, Girshick R (2017) Mask R-CNN. In: IEEE International Conference on Computer Vision (ICCV), Venice, pp. 2980–2988

  83. He D, Li F, Zhao Q, Long X, Fu Y, Wen S (2018) Exploiting Spatial-Temporal Modelling and Multi-Modal Fusion for Human Action Recognition

  84. Himanshu R, Maheshkumar H,Kolekar, Keshav N, Mukherjee JK (2015) Trajectory based unusual human movement identification for video surveillance system. In Progress in Systems Engineering, pp. 789–794. Springer, Cham

  85. Hinton GE, Salakhutdinov RR (2006) Reducing the dimensionality of data with neural networks, Science (80-. )

  86. Hinton GE, Krizhevsky A, Wang SD (2011) Transforming auto-encoders. In: International conference on artificial neural networks, pp. 44–51. Springer, Berlin, Heidelberg

  87. Hongeng S, Bremond F, Nevatia R (2000) Representation and optimal recognition of human activities. In: IEEE Conference on Computer Vision and Pattern Recognition, vol. 1, pp. 818–825

  88. Huang JF, Chen SL (2014) Detection of violent crowd behavior based on statistical characteristics of the optical flow. In: 2014 11th Int. Conf. Fuzzy Syst. Knowl. Discov. FSKD 2014, pp 565–569

  89. Huang JH, Murn L, Mrak M, Worring M, (2021) GPT2MVS: Generative Pre-trained Transformer-2 for Multi-modal Video Summarization. arXiv preprint arXiv:2104.12465

  90. Hunter J (2001) Adding multimedia to the semantic web: building an MPEG-7 ontology. In: Proceedings of the First International Conference on Semantic Web Working (SWWS’01), CEUR-WS.org, Aachen, DEU, 261–283

  91. Hussain T, Muhammad K, Ding W, Lloret J, Baik SW, de Albuquerque VHC (2021) A comprehensive survey of multi-view video summarization. In: Pattern Recognition 109:107567

    Google Scholar 

  92. Ji X, Zuo X, Wang C, Wang Y (2015) A simple human interaction recognition based on global gist feature model. International conference on intelligent robotics and applications. Springer, Cham, pp 487–498

    Google Scholar 

  93. Jia Y, Shelhamer E, Donahue J, Karayev S, Long J, Girshick R, Guadarrama S, Darrell T (2014) Caffe: Convolutional Architecture for Fast Feature Embedding. In Proceedings of the 22nd ACM international conference on Multimedia (MM’14). Association for Computing Machinery, New York, NY, USA, 675–678

  94. Joao Carreira AZ, Noland E, Hillier C (2019) A Short Note on the Kinetics-700 Human Action Dataset

  95. Jordan Michael I, Zoubin Ghahramani, Jaakkola Tommi S, Saul Lawrence K (1999) An introduction to variational methods for graphical models. Mach Learn 37(2):183–233

  96. Kavukcuoglu K, Ranzato M, Fergus R, LeCun Y (2009) Learning invariant features through topographic filter maps, 2009 IEEE Conference on Computer Vision and Pattern Recognition, Miami, FL, pp. 1605-1612

  97. Kavukcuoglu K, Sermanet P, Boureau Y, LeCun Y, Gregor K, Mathieu M (2010) Learning Convolutional Feature Hierarchies for Visual Recognition, NIPS

  98. Kim J, Grauman K (2009) Observe locally, infer globally: a space-time MRF for detecting abnormal activities with incremental updates. In: 2009 IEEE computer society conference on computer vision and pattern recognition workshops. CVPR Workshops 2009

  99. Kliper-Gross O, Hassner T, Wolf L (2012) The action similarity labeling challenge. IEEE Trans Pattern Anal Mach Intell

  100. Kompatsiaris I, Mezaris V, Strintzis MG (2005) Multimedia content indexing and retrieval using an object ontology. Multimedia content and semantic web-methods, standards and tools. Wiley, Hoboken, pp 339–371

  101. Kong T, Yao A, Chen Y, Sun F (2016) HyperNet: Towards accurate region proposal generation and joint object detection, In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition

  102. Kotsiantis S, Kanellopoulos D, Pintelas P (2004) Multimedia mining. WSEAS Trans Syst 3(10):3263–3268

    Google Scholar 

  103. Krishna R et al (2017) Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations. Int J Comput Vis 123(1):32–73

    MathSciNet  Google Scholar 

  104. Krizhevsky A, Sutskever I (2012) Hinton GE (2012) AlexNet. Neural Inf. Process. Syst p Adv

  105. Krizhevsky A, Sutskever I, GE H (2012) ImageNet Classification with Deep Convolutional Neural Networks, Advances in neural network.pp. 1–9

  106. Kuehne H, Jhuang H, Stiefelhagen R, Serre Thomas T (2013) Hmdb51: a large video database for human motion recognition, in High Performance Computing in Science and Engineering 12: Transactions of the High Performance Computing Center, Stuttgart (HLRS) 2012

  107. Kuo W, Hariharan B, Malik J (2015) Deepbox: Learning objectness with convolutional networks. In: IEEE international conference on computer vision, pp. 2479–2487

  108. Leach M, Baxter R, Robertson N, Sparks E (2014) Detecting social groups in crowded surveillance videos using visual attention, IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit. Work., pp. 467–473

  109. Leal-Taixé L, Milan A, Rei I, Roth S, SchindlerK (2015) MOTChallenge 2015: Towards a Benchmark for Multi-Target Tracking. arXiv:1504.01942 [cs], (arXiv: 1504.01942)

  110. Lecun Y, Bengio Y, Hinton G (2015) Deep learning. nature 521(7553): 436–444

  111. Lee SC, Nevatia R (2014) Hierarchical abnormal event detection by real time and semi-real time multi-tasking video surveillance system. Mach Vis Appl 25(1):133–143

  112. Leo M, Furnari A, Medioni GG, Trivedi M, Farinella GM (2019) Deep learning for assistive computer vision. In: Lect. Notes Comput. Sci. (including Subser. Lect. Notes Artif. Intell. Lect. Notes Bioinformatics), vol. 11134 LNCS, pp. 3–14

  113. Li Y, Huang C, Nevatia R (2009) Learning to associate: Hybridboosted multi-target tracker for crowded scene. In: IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit. Work. CVPR Work. 2009, vol. 2009 IEEE, pp. 2953–2960

  114. Li C, Han Z, Ye Q, Jiao J (2013) Visual abnormal behavior detection based on trajectory sparse reconstruction analysis. Neurocomputing 119:94–100

  115. Li X, Zhao B, Lu X (2017) A general framework for edited video and raw video summarization. IEEE Transactions on Image Processing 26(8):3652–3664

    MathSciNet  MATH  Google Scholar 

  116. Li T, Chen X, Zhu F, Zhang Z, Yan H (2021) Two-stream deep spatial-temporal auto-encoder for surveillance video abnormal event detection. Neurocomputing 439:256–270

  117. Liao W, Yang C, Ying Yang M, Rosenhahn B (2017) Security event recognition for visual surveillance. ISPRS Ann Photogramm Remote Sens Spat Inf Sci 4(1W1):19–26

  118. Lienhart R, Maydt J (2002) An extended set of Haar-like features for rapid object detection. In: International conference on image processing. Proceedings, Rochester, NY, USA, pp I–I

  119. Lin T-Y, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Dollár P, Zitnick CL (2014) Microsoft coco: Common objects in context. In: European conference on computer vision, pp. 740–755. Springer, Cham

  120. Lin T, Dollár P, Girshick R, He K, Hariharan B, Belongie S (2017) Feature Pyramid Networks for Object Detection. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, pp. 936–944

  121. Liu H, Chen S, Kubota N (2013) Intelligent video systems and analytics: a survey. IEEE Transactions on Industrial Informatics 9(3):1222–1233

    Google Scholar 

  122. Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu C-Y, Berg AC (2016) Ssd: Single shot multibox detector. In: European conference on computer vision, pp. 21–37. Springer, Cham

  123. Lowe DG (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vis 60:91–110

    Google Scholar 

  124. Mahmood, K, Takahashi H (2015) Cloud based sports analytics using semantic Web tools and technologies. In 2015 IEEE 4th Global Conference on Consumer Electronics (GCCE), pp. 431–433. IEEE

  125. Markowska-Kaczmar U, Kwasnicka H (2018) Deep learning: a new era in bridging the semantic gap. Bridging the semantic gap in image and video analysis 2018, Springer, Cham, pp 123–159

    Google Scholar 

  126. Meditskos G, Kompatsiari, iknow: ontology-driven situational awareness for the recognition of activities of daily living. Pervasive Mobile Comput 40:17–41. In the same way, Meditskos and Kompatsiaris (2017)

  127. Meditskos G, Dasiopoulou S, Efstathiou V, Kompatsiaris I (2013) SP-ACT: A hybrid framework for complex activity recognition combining OWL and SPARQL rules, 2013 IEEE Int. Conf. Pervasive Comput. Commun. Work. PerCom Work. 2013, no. March, pp. 25–30

  128. Miao Y, Song J (2014) Abnormal event detection based on SVM in video surveillance. In: Proc. - 2014 IEEE Work. Adv. Res. Technol. Ind. Appl. WARTIA 2014, pp 1379–1383

  129. Milan A, Leal-Taixé L, Reid I, Roth S, Schindler K (2016) MOT16: A Benchmark for Multi-Object Tracking. arXiv:1603.00831 [cs], (arXiv: 1603.00831)

  130. Mitra S, Acharya T (2003) Data Mining: Concepts and Algorithms From Multimedia to Bioinformatics. 2003

  131. Monfort M et al (2018) Moments in Time Dataset: one million videos for event understanding. CoRR abs-1801.0:1–11

  132. Muneeb ul Hassan (2018) VGG16 - Convolutional Network for Classification and Detection, Neurohive

  133. Nabati M, Behrad A (2020) Multi-Sentence Video Captioning using Content-oriented Beam Searching and Multi-stage Refining Algorithm. Inf Process Manag 57(6):102302

  134. Najibi M, Rastegari M, Davis LS (2016) G-cnn: an iterative grid based object detector. In: IEEE conference on computer vision and pattern recognition, pp. 2369–2377

  135. Nallaivarothayan H, Fookes C, Denman S, Sridharan S (2014) An MRF based abnormal event detection approach using motion and appearance features. In: 11th IEEE Int. Conf. Adv. Video Signal-Based Surveillance, AVSS 2014, pp 343–348

  136. Naphade M et al (2006) Large-scale concept ontology for multimedia. IEEE Multimed. 13(3):86–91

    Google Scholar 

  137. Nevatia R, Hobbs J, Bolles B (2004) An ontology for video event representation. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit. Work

  138. Ngiam J, Khosla A, Kim M, Nam J, Lee H, Ng AY (2011) Multimodal deep learning. In: Proceedings of the 28th International Conference on International Conference on Machine Learning (ICML’11). Omnipress, Madison, WI, USA, 689–696

  139. Noh H, Hong S, Han B (2015) Learning deconvolution network for semantic segmentation, in Proceedings of the IEEE International Conference on Computer Vision,pp. 1520–1528

  140. OM P, A V, A Z, C V (n.d.) Jawahar, The Oxford-IIIT Pet Dataset. Available: https://www.robots.ox.ac.uk/vgg/data/pets/

  141. Oquab M, Bottou L (2014) Learning and Transferring Mid-Level Image Representations using Convolutional Neural Networks. In: IEEE Conf. Comput. Vis. Pattern Recognit., pp. 1717–1724

  142. Oquab M et al (2015) Weakly supervised object recognition with convolutional neural networks, HAL Id: hal-01015140

  143. Ouyang W, Wang X, Zeng X, Qiu S, Luo P, Tian Y, Li H et al (2015) Deepid-net: Deformable deep convolutional neural networks for object detection. In: IEEE conference on computer vision and pattern recognition, pp. 2403–2412

  144. Pan J-Y, Faloutsos C (2002) GeoPlot: Spatial data mining on video libraries. In:Proc. Elev. Int. Conf. Inf. Knowl. Manag. (CIKM 2002), pp. 405–412

  145. Pantoja C, Ciapetti A, Massari C, Tarantelli M (2015) Action recognition in surveillance videos using semantic web rules. In: 6th international conference on imaging for crime prevention and detection (ICDP-15), pp 1–6

  146. Papadopoulos GT, Mezaris V, Kompatsiaris I, Strintzis MG (2007) Semantic multimedia: second international conference on semantic and digital media technologies, SAMT 2007, Genoa, Italy, December 5–7, 2007, Proceedings. Ontology-driven semantic video analysis using visual information objects. Springer, Berlin, pp 56–69

  147. Pareek P, Thakkar A (2021) A survey on video-based human action recognition: recent updates, datasets, challenges, and applications. Artif Intell Rev 54(3):2259–2322

    Google Scholar 

  148. Patel AS, Merlino G, Bruneo D, Puliafito A, Vyas OP, Ojha M (2021) Video representation and suspicious event detection using semantic technologies. Semantic Web 12(3):467–491

  149. Patino L, Cane T, Vallee A, Ferryman J (2016) PETS 2016: Dataset and Challenge, IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit. Work., pp. 1240–1247

  150. Patino L, Ferryman J (2014) PETS 2014: Dataset and challenge, in 11th IEEE International Conference on Advanced Video and Signal-Based Surveillance, AVSS 2014

  151. Petrucci G, Ghidini C, Rospocher M (2016) Ontology learning in the deep. In: European Knowledge AcquisitionWorkshop EKAW2016: Knowledge Engineering and Knowledge Management, pp. 480–495

  152. Pinheiro PO, Lin TY, Collobert R, Dollár P (2016) Learning to refine object segments. In: Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

  153. Qiu Z, Yao T, Mei T (2017) Learning Spatio-Temporal Representation with Pseudo-3D Residual Networks, Proc. IEEE Int. Conf. Comput. Vis., vol. 2017-Octob, pp. 5534–5542

  154. Quack T, Ferrari V, Van Gool L (2006) Video mining with frequent itemset configurations, Lect. Notes Comput. Sci. (including Subser. Lect. Notes Artif. Intell. Lect. Notes Bioinformatics), vol. 4071 LNCS, pp. 360–3696

  155. Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: Unified, real-time object detection. In: Proceedings of the IEEEConference on Computer Vision and Pattern Recognition pp. 779–788

  156. Redmon J, Farhadi A (2017) YOLO9000: better, faster, stronger. In: IEEE conference on computer vision and pattern recognition, pp. 7263–7271

  157. Redmon J, Farhadi A (2018) YOLOv3: An incremental improvement. arXiv preprint arXiv:1804.02767

  158. Ren X, Ramanan D (2013) Histograms of Sparse Codes for Object Detection. In: IEEE Conf. Comput. Vis. Pattern Recognit., pp. 3246–3253

  159. Ren S, He K, Girshick R, Sun J (2017) Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. IEEE Trans Pattern Anal Mach Intell 39(6):1137–1149

  160. Ristani E, Solera F, Zou R, Cucchiara R, Tomasi C (2016) Performance measures and a data set for multi-target, multi-camera tracking, Lect. Notes Comput. Sci. (including Subser. Lect. Notes Artif. Intell. Lect. Notes Bioinformatics), vol. 9914 LNCS, no. c, pp. 17–35

  161. Ryoo MS, Matthies L (2013) First-person activity recognition: What are they doing to me?. In: Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit., pp. 2730–2737

  162. SanMiguel JC, Martínez JM, García Á (2009) An Ontology for Event Detection and its Application in Surveillance Video, IEEE Int. Conf. Adv. Video Signal-Based Surveill., pp. 220–225

  163. Sanmiguel JC, Martínez JM (2012) A semantic-based probabilistic approach for real-time video event recognition. Comput Vis Image Underst 116(9):937–952

    Google Scholar 

  164. Sanmiguel JC, Martínez JM (2013) A semantic-guided and self-configurable framework for video analysis. Mach Vis Appl 24(3):493–512

  165. Saini R, Ahmed A, Dogra DP, Roy PP (2018) Proceedings of 2nd International Conference on Computer Vision & Image Processing, vol. 703, pp. 261–271

  166. Saravanan D, Srinivasan S (2010) Data mining framework for video data. Recent Adv. Sp. Technol. Serv. Clim. Chang. 2010 (RSTS CC-2010), pp 167–170

  167. Sermanet P, Kavukcuoglu K, Chintala S,Lecun Y (2013) Pedestrian detection with unsupervised multi-stage feature learning, In: IEEE Conference on Computer Vision and Pattern Recognition

  168. Sermanet P, Eigen D, Zhang X, Mathieu M, Fergus R, LeCun Y (2013) Overfeat: Integrated recognition, localization and detection using convolutional networks. arXiv preprint arXiv:1312.6229

  169. Shen J, Tao D, Li X (2008) Modality mixture projections for semantic video event detection. IEEE Transactions on Circuits and Systems for Video Technology 18(11):1587–1596

    Google Scholar 

  170. Shen J, Wang M, Chua TS (2016) Accurate online video tagging via probabilistic hybrid modeling. Multimedia Systems 22(1):99–113

    Google Scholar 

  171. Shen Z, Liu Z, Li J, Jiang Y-G, Chen Y, Xue X (2017) Dsod: Learning deeply supervised object detectors from scratch. In: Proceedings of the IEEE international conference on computer vision, pp. 1919–1927

  172. Si Z, Pei M, Yao B, Zhu SC (2011) Unsupervised learning of event AND-OR grammar and semantics from video, In: Proc. IEEE Int. Conf. Comput. Vis., pp. 41–48

  173. Sikos LF, Powers DMW (2015) Knowledge-Driven Video Information Retrieval with LOD: From Semi-Structured to Structured Video Metadata, Proc. Eighth Work. Exploit. Semant. Annot. Inf. Retr., pp. 35–37

  174. Sikos LF (2016) A Novel Approach to Multimedia Ontology Engineering for Automated Reasoning over Audiovisual LOD Datasets, Springer-Verlag Berlin Heidelb, 9621:3–12

  175. Sikos LF (2017) Description logics in multimedia reasoning. In: Springer, Cham, ISBN: 978-3-319-54066-5

  176. Sikos LF (2018) VidOnt: a core reference ontology for reasoning over video scenes scenes. J Inf Telecommun 1–13

  177. Sigari MH, Soltanian-Zadeh H, Pourreza HR (2016) A framework for dynamic restructuring of semantic video analysis systems based on learning attention control. Image Vis Comput 53:20–34

    Google Scholar 

  178. Sivic J, Zisserman A (2004) Video data mining using con .gurations of viewpoint invariant regions, Proc. 2004 IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognition, 2004. CVPR 2004., pp. 488–495

  179. Smeulders AWM, Worring M, Santini S, Gupta A, Jain R (2000) Content-based image retrieval at the end of the early years. IEEE Transactions on Pattern Analysis and Machine Intelligence 22(12):1349–1380

    Article  Google Scholar 

  180. Snidaro L, Belluz M, Foresti GL (2007) Representing and recognizing complex events in surveillance applications, 2007 IEEE Conf. Adv. Video Signal Based Surveillance, AVSS 2007 Proc., pp. 493–498

  181. Snoek CGM, Huurnink B, Hollink L, De Rijke M, Schreiber M, Worring M (2007) Adding semantics to detectors for video retrieval. IEEE Transactions on multimedia 9(5): 975-986

  182. Sobhani F, Straccia U Towards a forensic event ontology to assist video surveillance-based vandalism detection. arXiv preprint arXiv:1903.09012

  183. Son J, Baek M, Cho M, Han B (2017) Multi-object tracking with quadruplet convolutional neural networks. In: 30th IEEE Conf. Comput. Vis. Pattern Recognition, pp. 3786–3795

  184. Sreeja MU, Kovoor BC (2021) A unified model for egocentric video summarization: an instance-based approach. Comput Electr Eng 1(92)

  185. Sreenu G, Durai MS (2019) Intelligent video surveillance: a review through deep learning techniques for crowd analysis. Journal of Big Data 6(1):48

    Google Scholar 

  186. Stavropoulos TG, Meditskos G, Kompatsiaris I, Demaware 2:integrating sensors, multimedia and semantic analysis for the ambient care of dementia. Pervasive Mobile Comput 34:126–1

  187. Suresh V, Mohan CK, Kumaraswamy R, Yegnanarayana B (2005) Combining multiple evidence for video classification. In: Proc. - 2005 Int. Conf. Intell. Sens. Inf. Process. ICISIP’05, vol 2005, pp. 187–192

  188. Szegedy C, Toshev A, Erhan D (2013) Deep neural networks for object detection. In Advances in neural information processing systems, pp. 2553–2561

  189. Szegedy C et al. (2014) Going deeper with convolutions. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1–9

  190. Tani MYK, Lablack A, Ghomari A, Bilasco IM (2015) Events detection using a video-surveillance ontology and a rule-based approach, Lect. Notes Comput. Sci. (including Subser. Lect. Notes Artif. Intell. Lect. Notes Bioinformatics), 8926:299–308

  191. Tani MYK, Ghomari A, Lablack A, Bilasco IM (2017) OVIS: ontology video surveillance indexing and retrieval system. Int J Multimed Inf Retr 6(4):295–316

  192. Tasnim N, Islam MK, Baek JH (2021) Deep Learning Based Human Activity Recognition Using Spatio-Temporal Image Formation of Skeleton Joints. Appl Sci 11(6):2675

    Google Scholar 

  193. Town C (2006) Ontological inference for image and video analysis. Mach Vis Appl 17(2):94–115

    Google Scholar 

  194. 2014 TRECVID Multimedia Event Detection & Multimedia Event Recounting Tracks (2011)  Available: http://nist.gov/itl/iad/mig/med14.cfm

  195. Turaga PK, Veeraraghavan A, Chellappa R (2007) From videos to verbs: Mining videos for activities using a cascade of dynamical systems, In:Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognition

  196. Uijlings JRR, Van De Sande KEA, Gevers T, Smeulders AWM (2012) Selective Search for Object Recognition

    Google Scholar 

  197. Ullah A, Muhammad K, Ding W, Palade V, Haq IU, Baik SW (2021) Efficient activity recognition using lightweight CNN and DS-GRU network for surveillance applications. Appl Soft Comput 103:107102

  198. Vallet D, Castells P, Fernández M, Mylonas P, Avrithis Y (2007) Personalized content retrieval in context using ontological knowledge. IEEE Trans. Circuits Syst. Video Technol. 17(3):336–345

    Google Scholar 

  199. Van de Sande K, Gevers T, Snoek C (2010) Evaluating Color Descriptors for Object and Scene Recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence 32(9):1582–1596

    Article  Google Scholar 

  200. Vijayakumar V, Nedunchezhian R (2012) A study on video data mining. Int J Multimed Inf Retr 1(3):153–172

    Google Scholar 

  201. WADLEY FM (2006) Probit Analysis: A Statistical Treatment of the Sigmoid Response Curve. 2nd ed. D. J. Finney. New York-London: Cambridge Univ. Press, 1952. 318 pp. $7.00, Science (80-. )

  202. Wang H (2015) Semantic Deep Learning, University of Oregon, pp. 1–42

  203. Wang T, Snoussi H (2014) Detection of abnormal visual events via global. IEEE Trans Inf Forensics Secur 9(6):988–998

    Google Scholar 

  204. Wang B, Li W, Yang W, Liao Q (2011) Illumination normalization based on weber’s law with application to face recognition. IEEE Signal Process Lett

  205. Wang M, Hong R, Li G, Zha ZJ, Yan S, Chua TS (2012) Event driven web video summarization by tag localization and key-shot identification. IEEE Transactions on Multimedia 14(4):975–985

    Google Scholar 

  206. Wang X, Ji Q (2015) Video event recognition with deep hierarchical context model. In:Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit., vol. 07-12-June, pp. 4418–4427

  207. Wang L et al (2016) Temporal segment networks: Towards good practices for deep action recognition. In: Lect. Notes Comput. Sci. (including Subser. Lect. Notes Artif. Intell. Lect. Notes Bioinformatics), vol. 9912 LNCS, pp. 20–36

  208. Wang H, Dou D, Lowd D (2016) Ontology-based deep restricted boltzmann machine. In: 27th International Conference on Database and Expert Systems Applications, DEXA 2016, Porto, Portugal, September 5–8, 2016, Proceedings, Part I, pp. 431–445. Springer International Publishing

  209. Wang X, Girshick R, Gupta A, He K (2018) Non-local Neural Networks. In: Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit., pp. 7794–7803

  210. Wojke N, Bewley A, Paulus D (2018) Simple online and realtime tracking with a deep association metric, Proc. - Int. Conf. Image Process. ICIP, vol. 2017-Septe, pp. 3645–3649

  211. Wu Z et al (2015) Modeling spatial-temporal clues in a hybrid deep learning framework for video classification. In: proceedings of the 23rd ACM international conference on Multimedia

  212. Wu G, Liu L, Guo Y, Ding G, Han J, Shen J, Shao L (2017) August. Unsupervised deep video hashing with balanced rotation, IJCAI

    Google Scholar 

  213. Xie L, Sundaram H, Campbell M (2008) Event mining in multimedia streams. In: Proc. IEEE 96(4):623–647

    Google Scholar 

  214. 246 Xu Z, Mei L, Liu Y, Hu C (2013) Video structural description: a semantic based model for representing and organizing video surveillance big data. In: 2013 IEEE 16th international conference on computational science and engineering (CSE), IEEE, pp 802–809

  215. Xu Z, Liu Y, Mei L, Hu C, Chen L (2015) Semantic based representing and organizing surveillance big data using video structural description technology. J Syst Softw 102:217–225

    Google Scholar 

  216. Xu D, Zhu Y, Choy CB, Fei-Fei L (2017) Scene graph generation by iterative message passing. In: Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 5410–5419)

  217. Xuan Wang HC, Song H (2017) Pedestrian abnormal event detection based on multi-feature fusion in traffic video. Optik (Stuttg) 11(3):29–38

  218. Xue J, Li J, Gong Y (2013) Restructuring of deep neural network acoustic models with singular value decomposition, In: Annual Conference of the International Speech Communication Association, INTERSPEECH, pp. 2365–2369

  219. Yao BZ, Yang X, Lin L, Lee MW, Zhu SC (2010) I2t: image parsing to text description. In: Proc IEEE 98(8):1485–150

    Google Scholar 

  220. Yoo D, Park S, Lee J-Y, Paek AS, Kweon IS (2015) Attentionnet: Aggregating weak directions for accurate object detection. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2659–2667

  221. Yu, J, Lee Y, Yow KC, Jeon M, Pedrycz W (2021) Abnormal event detection and localization via adversarial event prediction. IEEE Transactions on Neural Networks and Learning Systems

  222. Zablocki M, Gosciewska K, Frejlichowski D, Hofman R (2014) Intelligent video surveillance systems for public spaces-a survey. Journal of Theoretical and Applied Computer Science 8(4):13–27

    Google Scholar 

  223. Zaidenberg S, Boulay B, Brémond F (2012) A generic framework for video understanding applied to group behavior recognition, Proc. - 2012 IEEE 9th Int. Conf. Adv. Video Signal-Based Surveillance, AVSS 2012, pp. 136–142

  224. Zeiler MD, Krishnan D, Taylor GW, Fergus R (2010) Deconvolutional networks. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, San Francisco, CA, pp. 2528–2535

  225. Zhang T, Yang Z, Jia W, Yang B, Yang J, He X (2016) A new method for violence detection in surveillance scenes. Multimed Tools Appl 75(12):7327–7349

    Google Scholar 

  226. Zhang T, Jia W, Yang B, Yang J, He X, Zheng Z (2017) MoWLD: a robust motion image descriptor for violence detection. Multimed Tools Appl 76(1):1419–1438

    Google Scholar 

  227. Zhao Y, Qiao Y, Yang J, Kasabov N (2015) Abnormal activity detection using spatio-temporal feature and Laplacian sparse representation, in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

  228. Zhao ZQ, Xie BJ, Cheung Y, Wu X, (2015) Plant Leaf Identification via a Growing Convolution Neural Network with Progressive Sample Learning. In: Cremers D., Reid I., Saito H., Yang MH. (eds) Computer Vision - ACCV, (2014) ACCV 2014, vol 9004. Lecture Notes in Computer Science. Springer, Cham

  229. Zhang Y, Lin W, Zhang G, Luo C, Jiang D, Yao C (2014) A new approach for extracting and summarizing abnormal activities in surveillance videos, in 2014 IEEE International Conference on Multimedia and Expo Workshops, ICMEW 2014

  230. Zhang Y, Sohn K, Villegas R, Pan G, Lee (2015) Improving object detection with deep convolutional networks via bayesian optimization and structured prediction. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 249–258

  231. Zhang X et al (2018) Qiniu Submission to Activity Net Challenge. pp 1–4

  232. Zhou B, Andonian A, Oliva A, Torralba A (2018) Temporal Relational Reasoning in Videos. In: Lect. Notes Comput. Sci. (including Subser. Lect. Notes Artif. Intell. Lect. Notes Bioinformatics), vol. 11205 LNCS, pp. 831–846

  233. Zhu X, Wu X, Elmagarmid AK, Feng Z, Wu L (2005) Video data mining: semantic indexing and event detection from the association perspective. IEEE Trans Knowl Data Eng 17(5):665–667

    Google Scholar 

  234. Zitnick CL, Dollár P (2014) Edge boxes: Locating object proposals from edges. In: European conference on computer vision, pp. 391–405. Springer, Cham

Download references

Acknowledgements

We thank Dr. Vivek Tiwari (Department of Computer Science and Engineering at International Institute of Information Technology Naya Raipur) for improving the technical writing and flow of the paper.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ashish Singh Patel.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Patel, A.S., Vyas, R., Vyas, O.P. et al. A study on video semantics; overview, challenges, and applications. Multimed Tools Appl 81, 6849–6897 (2022). https://doi.org/10.1007/s11042-021-11722-1

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-021-11722-1

Keywords

Navigation