Skip to main content

Advertisement

Log in

Towards machine vision-based video analysis in smart cities: a survey, framework, applications and open issues

  • 1229: Multimedia Data Analysis for Smart City Environment Safety
  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

The multifarious video contents generated through various applications is growing exponentially resulting in huge volumes. Analyzing these contents manually is a cumbersome task particularly in terms of extracting key information by satisfying a criterion. The concept of smart city is popular across the world to develop technology-driven cities where facilities are offered intelligently. In smart cities, the video analysis may play a crucial role in several real-life applications such as, smart security surveillance, traffic monitoring, video forensics, sports, entertainment, medical, etc. Thus, video analysis plays a vital role where larger real-time videos can be intelligently analyzed to detect key interesting patterns to yield an application-centric shorter summary. Moreover, machine or deep paradigms can be applied on video data generated in various smart applications in the smart cities to create a real-time model. In this study, we explore the usage of video analysis in various real-time applications in smart cities. Hence, the work aims to expound a detailed investigation of computer vision-based video analysis approaches in various aspects of smart cities. Besides, a generic video analysis layered architecture is also presented which highlights the deployment of video analysis-centric approaches for real-life smart cities facilities. Our analysis of the existing approaches clearly demonstrates the pertinency of video analysis in several smart city’s mundane infrastructure. However, the study also reveals numerous scopes where video analysis yet to be explored and that offers a clear insight to the researchers. In addition to opportunities, our study identifies some open research challenges to the active research community. Moreover, the survey can serve as a reference to the investigators as well as to the planning and development authorities of smart cities.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19
Fig. 20
Fig. 21
Fig. 22

Similar content being viewed by others

References

  1. Aggarwal JK, Ryoo MS (2011) Human activity analysis: A review. ACM Comput Surv 43(3). https://doi.org/10.1145/1922649.1922653

  2. Agyeman R, Muhammad R, Choi GS (2019) Soccer Video Summarization Using Deep Learning. Proceedings - 2nd International Conference on Multimedia Information Processing and Retrieval, MIPR 2019, pp. 270–273. https://doi.org/10.1109/MIPR.2019.00055

  3. Ahmed SA, Dogra DP, Kar S, Roy PP (2019) Trajectory-Based Surveillance Analysis: A Survey. IEEE Trans Circuits Syst Video Technol 29(7):1985–1997. https://doi.org/10.1109/TCSVT.2018.2857489

    Article  Google Scholar 

  4. Ali H, Sharif M, Yasmin M, Rehmani MH, Riaz F (2020) A survey of feature extraction and fusion of deep learning for detection of abnormalities in video endoscopy of gastrointestinal-tract. Artif Intell Rev 53(4):2635–2707. https://doi.org/10.1007/s10462-019-09743-2

    Article  Google Scholar 

  5. Ali JJ, Shati NM, Gaata MT (2020) Abnormal activity detection in surveillance video scenes. Telkomnika (Telecommun Comput Electron Control) 18(5):2447–2453. https://doi.org/10.12928/TELKOMNIKA.V18I5.16634

    Article  Google Scholar 

  6. Aslan MF, Durdu A, Sabanci K (2020) Human action recognition with bag of visual words using different machine learning methods and hyperparameter optimization. Neural Comput Appl 32(12):8585–8597. https://doi.org/10.1007/s00521-019-04365-9

    Article  Google Scholar 

  7. Basavarajaiah M, Sharma P (2019) Survey of compressed domain video summarization techniques. ACM Comput Surv (CSUR) 52(6):1–29

    Article  Google Scholar 

  8. Ben Mabrouk A, Zagrouba E (2018) Abnormal behavior recognition for intelligent video surveillance systems: A review. Expert Syst Appl 91:480–491. https://doi.org/10.1016/j.eswa.2017.09.029

    Article  Google Scholar 

  9. Caruccio L, Polese G, Tortora G, Iannone D (2019) EDCAR: A knowledge representation framework to enhance automatic video surveillance. Expert Syst Appl 131:190–207. https://doi.org/10.1016/j.eswa.2019.04.031

    Article  Google Scholar 

  10. Chaquet JM, Carmona EJ, Fernández-Caballero A (2013) A survey of video datasets for human action and activity recognition. Comput Vis Image Underst 117(6):633–659. https://doi.org/10.1016/j.cviu.2013.01.013

    Article  Google Scholar 

  11. Choroś K (2014) Categorization of sports video shots and scenes in tv sports news based on ball detection. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 8397 LNAI (PART 1):591–600. https://doi.org/10.1007/978-3-319-05476-6_60

  12. Davids DM, Christopher CS (2021) An efficient video summarization for surveillance system using normalized k-means and quick sort method. Microprocess Microsyst 83(September 2020):103960. https://doi.org/10.1016/j.micpro.2021.103960

    Article  Google Scholar 

  13. Dhiman C, Vishwakarma DK (2020) View-Invariant Deep Architecture for Human Action Recognition Using Two-Stream Motion and Shape Temporal Dynamics. IEEE Trans Image Process 29(DI):3835–3844. https://doi.org/10.1109/TIP.2020.2965299

    Article  MATH  Google Scholar 

  14. Elharrouss O, Almaadeed N, Al-Maadeed S, Bouridane A, Beghdadi A (2021) A combined multiple action recognition and summarization for surveillance video sequences. Appl Intell 51(2):690–712. https://doi.org/10.1007/s10489-020-01823-z

    Article  Google Scholar 

  15. Evangelopoulos G et al (2009) Video event detection and summarization using audio, visual and text saliency. ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings (April) pp. 3553–3556. https://doi.org/10.1109/ICASSP.2009.4960393

  16. Fei M, Jiang W, Mao W (2018) Creating personalized video summaries via semantic event detection. J Ambient Intell Humaniz Comput 1–12. https://doi.org/10.1007/s12652-018-0797-0

  17. Feng W, Liu R, Zhu M (2014) Fall detection for elderly person care in a vision-based home surveillance environment using a monocular camera. Signal Image Video Process 8(6):1129–1138. https://doi.org/10.1007/s11760-014-0645-4

    Article  Google Scholar 

  18. Furini M, Ghini V (2006) An audio-video summarization scheme based on audio and video analysis. In: CCNC, vol 2006, pp 1209–1213

  19. Geertsema EE, Visser GH, Viergever MA, Kalitzin SN (2019) Automated remote fall detection using impact features from video and audio. J Biomech 88:25–32. https://doi.org/10.1016/j.jbiomech.2019.03.007

    Article  Google Scholar 

  20. Ghafoor HA, Javed A, Irtaza A, Dawood H, Dawood H, Banjar A (2018) Egocentric Video Summarization Based on People Interaction Using Deep Learning. Math Probl Eng 2018:1–12. https://doi.org/10.1155/2018/7586417

    Article  Google Scholar 

  21. Guan G, Wang Z, Mei S, Ott M, He M, Feng DD (2014) A top-down approach for video summarization. ACM Trans Multimed Comput Commun Appl 11(1). https://doi.org/10.1145/2632267

  22. Guo G, Lai A (2014) A survey on still image based human action recognition. Pattern Recognit 47(10):3343–3361. https://doi.org/10.1016/j.patcog.2014.04.018

    Article  Google Scholar 

  23. Gupta P, Pol S, Rahatekar D, Patil A (2016) Smart Ambulance System. In: National Conference on Advances in Computing, Communication and Networking (ACCNet – 2016), pp 1–60 [Online]. Available: https://pdfs.semanticscholar.org/6bd6/3a0a2f9473ad725c6ff72c5883b14e0123c9.pdf

  24. Han Y, Zhang P, Zhuo T, Huang W, Zhang Y (2018) Going deeper with two-stream ConvNets for action recognition in video surveillance. Pattern Recognit Lett 107:83–90. https://doi.org/10.1016/j.patrec.2017.08.015

    Article  Google Scholar 

  25. Hassan E, Shams MY, Hikal NA, Elmougy S (2023) COVID-19 Diagnosis-Based Deep Learning Approaches for COVIDx Dataset: A Preliminary Survey, in Artificial Intelligence for Disease Diagnosis and Prognosis in Smart Healthcare. https://doi.org/10.1201/9781003251903-6

  26. Hassan E, Shams MY, Hikal NA, Elmougy S (2022) The effect of choosing optimizer algorithms to improve computer vision tasks: a comparative study. Multimed Tools Appl. https://doi.org/10.1007/s11042-022-13820-0

    Article  Google Scholar 

  27. He L, Wen S, Wang L, Li F (2020) Vehicle theft recognition from surveillance video based on spatiotemporal attention. Appl Intell. 2128–2143. https://doi.org/10.1007/s10489-020-01933-8

  28. He K, Zhang X, Ren S, Sun J (2015) Deep residual learning for image recognition. IEEE Xplore 45(8):770–778. https://doi.org/10.1109/CVPR.2016.90

  29. Huang C, Wang H (2020) A Novel Key-Frames Selection Framework for Comprehensive Video Summarization. IEEE Trans Circuits Syst Video Technol 30(2):577–589. https://doi.org/10.1109/TCSVT.2019.2890899

    Article  Google Scholar 

  30. Hussain T et al (2021) A comprehensive survey of multi-view video summarization. Elsevier, vol. 109. https://doi.org/10.1016/j.patcog.2020.107567

  31. Hussein F, Piccardi M (2017) V-Jaune. ACM Trans Multimed Comput Commun Appl 13(2):1–19. https://doi.org/10.1145/3063532

    Article  Google Scholar 

  32. Intel (2020) Robotics in healthcare: the future of robots in medicine. https://www.intel.com/content/www/us/en/healthcare-it/robotics-in-healthcare.html

  33. Jeyanthi Suresh A, Visumathi J (2020) Inception ResNet deep transfer learning model for human action recognition using LSTM. Mater Today Proc. no. xxxx. https://doi.org/10.1016/j.matpr.2020.09.609

  34. Kakadiya R, Lemos R, Mangalan S, Pillai M, Nikam S (2019) AI Based Automatic Robbery/Theft Detection using Smart Surveillance in Banks. Proceedings of the 3rd International Conference on Electronics and Communication and Aerospace Technology, ICECA 2019, pp. 201–204. https://doi.org/10.1109/ICECA.2019.8822186

  35. Kalaivani P, Mohamed Mansoor Roomi S (2017) Towards comprehensive understanding of event detection and video summarization approaches. Proceedings - 2017 2nd International Conference on Recent Trends and Challenges in Computational Models, ICRTCCM 2017, pp. 61–66. https://doi.org/10.1109/ICRTCCM.2017.84

  36. Keyvanpour MR, Vahidian S, Ramezani M (2020) HMR-vid: a comparative analytical survey on human motion recognition in video data. Multimed Tools Appl 79(43–44). https://doi.org/10.1007/s11042-020-09485-2

  37. Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: Advances in neural information processing systems, p 25

  38. Kumar H, Bhattacharya S, Thomas SS, Gupta S, Venkatesh KS (2017) Design of smart video surveillance system for indoor and outdoor scenes. Int Conf Digit Signal Process DSP 2017-Augus:1–5. https://doi.org/10.1109/ICDSP.2017.8096120

    Article  Google Scholar 

  39. Lavee G, Rivlin E, Rudzsky M (2009) Understanding video events: A survey of methods for automatic interpretation of semantic occurrences in video. IEEE Trans Syst Man Cybern Part C Appl Rev 39(5):489–504. https://doi.org/10.1109/TSMCC.2009.2023380

    Article  Google Scholar 

  40. Lecun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. In: Proceedings of the IEEE, vol 86, no. 11, pp 2278-2324. https://doi.org/10.1109/5.726791

  41. Li K, Wang J, Wang H, Dai Q (2015) Structuring lecture videos by automatic projection screen localization and analysis. IEEE Trans Pattern Anal Mach Intell 37(6):1233–1246. https://doi.org/10.1109/TPAMI.2014.2361133

    Article  Google Scholar 

  42. Li Y, Zhai Q, Ding S, Yang F, Li G, Zheng YF (2019) Efficient health-related abnormal behavior detection with visual and inertial sensor integration. Pattern Anal Appl 22(2):601–614. https://doi.org/10.1007/s10044-017-0660-5

    Article  MathSciNet  Google Scholar 

  43. Liu H, Feris R, Sun M (2011) Visual Analysis of Humans. Visual Analysis of Humans. https://doi.org/10.1007/978-0-85729-997-0

    Article  Google Scholar 

  44. Luna E, Miguel JCS, Ortego D, Martínez JM (2018) Abandoned object detection in video-surveillance: Survey and comparison. Sensors (Switzerland) 18(12). https://doi.org/10.3390/s18124290

  45. Mahapatra A, Sa PK, Majhi B (2015) A multi-view video synopsis framework. In: 2015 IEEE International Conference on Image Processing (ICIP). IEEE, pp 1260–1264

  46. Mei T, Tang LX, Tang J, Hua XS (2013) Near-lossless semantic video summarization and its applications to video analysis. ACM Trans Multimed Comput Commun Appl 9(3). https://doi.org/10.1145/2487268.2487269

  47. Milotta FLM, Furnari A, Battiato S, Signorello G, Farinella GM (2019) Egocentric visitors localization in natural sites. J Vis Commun Image Represent 65(2). https://doi.org/10.1016/j.jvcir.2019.102664

  48. Mirza A, Zeshan O, Atif M, Siddiqi I (2020) Detection and recognition of cursive text from video frames. EURASIP J Image Video Process 1:2020. https://doi.org/10.1186/s13640-020-00523-5

    Article  Google Scholar 

  49. Mlik N, Barhoumi W, Zagrouba E (2012) Object-based event detection for the extraction of video key-frames. In: International Conference on Multimedia Computing and Systems, Tangier, Morocco

  50. del Molino AG, Tan C, Lim JH, Tan AH (2017) Summarization of Egocentric Videos: A Comprehensive Survey. IEEE Trans Hum Mach Syst 47(1):65–76. https://doi.org/10.1109/THMS.2016.2623480

    Article  Google Scholar 

  51. Muhammad K, Ahmad J, Lv Z, Bellavista P, Yang P, Baik SW (2019) Efficient Deep CNN-Based Fire Detection and Localization in Video Surveillance Applications. IEEE Trans Syst Man Cybern Syst 49(7):1419–1434. https://doi.org/10.1109/TSMC.2018.2830099

    Article  Google Scholar 

  52. Muhammad K, Ahmad J, Mehmood I, Rho S, Baik SW (2018) Convolutional Neural Networks Based Fire Detection in Surveillance Videos. IEEE Access 6(March):18174–18183. https://doi.org/10.1109/ACCESS.2018.2812835

    Article  Google Scholar 

  53. Muhammad K, Hussain T, Baik SW (2020) Efficient CNN based summarization of surveillance videos for resource-constrained devices. Pattern Recognit Lett 130:370–375. https://doi.org/10.1016/j.patrec.2018.08.003

    Article  Google Scholar 

  54. Münzer B, Schoeffmann K, Böszörmenyi L (2018) Content-based processing and analysis of endoscopic images and videos: A survey. Multimed Tools Appl 77(1):1323–1362. https://doi.org/10.1007/s11042-016-4219-z

    Article  Google Scholar 

  55. Senthil Murugan A, Suganya Devi K, Sivaranjani A, Srinivasan P (2018) A study on various methods used for video summarization and moving object detection for video surveillance applications. Multimed Tools Appl 77(18):23273–23290

    Article  Google Scholar 

  56. Muszynski M, Kostoulas T, Lombardo P, Pun T, Chanel G (2018) Aesthetic highlight detection in movies based on synchronization of spectators’ reactions. ACM Trans Multimed Comput Commun Appl 14(3). https://doi.org/10.1145/3175497

  57. Nie L, Hong R, Zhang L, Xia Y, Tao D, Sebe N (2016) Perceptual Attributes Optimization for Multivideo Summarization. IEEE Trans Cybern 46(12):2991–3003. https://doi.org/10.1109/TCYB.2015.2493558

    Article  Google Scholar 

  58. Oskouie P, Alipour S, Eftekhari-Moghadam AM (2014) Multimodal feature extraction and fusion for semantic mining of soccer video: A survey. Artif Intell Rev 42(2):173–210. https://doi.org/10.1007/s10462-012-9332-4

    Article  Google Scholar 

  59. Panda R, Roy-Chowdhury AK (2017) Multi-View Surveillance Video Summarization via Joint Embedding and Sparse Optimization. IEEE Trans Multimedia 19(9):2010–2021. https://doi.org/10.1109/TMM.2017.2708981

    Article  Google Scholar 

  60. Pareek P, Thakkar A (2021) A survey on video-based Human Action Recognition: recent updates, datasets, challenges, and applications 54(3). Springer Netherlands. https://doi.org/10.1007/s10462-020-09904-8

  61. Park H, Park S, Joo Y (2019) Robust detection of abandoned object for smart video surveillance in illumination changes. Sensors (Switzerland) 19(23). https://doi.org/10.3390/s19235114

  62. Park H, Park S, Joo Y (2020) Detection of Abandoned and Stolen Objects Based on Dual Background Model and Mask R-CNN. IEEE Access 8:80010–80019. https://doi.org/10.1109/ACCESS.2020.2990618

    Article  Google Scholar 

  63. Rajpoot V, Girase S (2018) A Study on Application Scenario of Video Summarization. Proceedings of the 2nd International Conference on Electronics, Communication and Aerospace Technology, ICECA 2018, no. Iceca, pp. 936–943. https://doi.org/10.1109/ICECA.2018.8474699

  64. Raposo F, Ribeiro R, Martins De Matos D (2016) Using Generic Summarization to Improve Music Information Retrieval Tasks. IEEE/ACM Trans Audio Speech Lang Process 24(6):1119–1128. https://doi.org/10.1109/TASLP.2016.2541299

    Article  Google Scholar 

  65. Rouast PV, Adam MTP (2020) Learning Deep Representations for Video-Based Intake Gesture Detection. IEEE J Biomed Health Inform 24(6):1727–1737. https://doi.org/10.1109/JBHI.2019.2942845

    Article  Google Scholar 

  66. Rouvier M, Oger S, Linarès G, Matrouf D, Merialdo B, Li Y (2015) Audio-based video genre identification. IEEE Trans Audio Speech Lang Process 23(6):1031–1041. https://doi.org/10.1109/TASLP.2014.2387411

    Article  Google Scholar 

  67. Sabeur Z, Angelopoulos CM, Bruno A (2021) Advanced cyber and physical situation awareness in urban smart spaces advanced cyber and physical situation awareness in urban smart spaces. (July). https://doi.org/10.1007/978-3-030-80285-1

  68. Sabha A, Selwal A (2021) HAVS: Human action-based video summarization, Taxonomy, Challenges, and Future Perspectives. Proceedings of the 2021 IEEE International Conference on Innovative Computing, Intelligent Communication and Smart Electrical Systems, ICSES 2021, pp. 1–9. https://doi.org/10.1109/ICSES52305.2021.9633804

  69. Sabha A, Selwal A (2023) CoSumNet: A video summarization-based framework for COVID-19 monitoring in crowded scenes. Artif Intell Med 107386. https://doi.org/10.1016/j.artmed.2023.102544

  70. Sabha A, Selwal A (2023) Data-driven enabled approaches for criteria-based video summarization: a comprehensive survey, taxonomy, and future directions. Multimed Tools Appl. https://doi.org/10.1007/s11042-023-14925-w

    Article  Google Scholar 

  71. Sahu A, Chowdhury AS (2020) Summarizing egocentric videos using deep features and optimal clustering. Neurocomputing 398:209–221. https://doi.org/10.1016/j.neucom.2020.02.099

    Article  Google Scholar 

  72. Sahu A, Chowdhury AS (2020) Multiscale summarization and action ranking in egocentric videos. Pattern Recognit Lett 133:256–263. https://doi.org/10.1016/j.patrec.2020.02.029

    Article  Google Scholar 

  73. Sanal Kumar KP, Bhavani R (2019) Human activity recognition in egocentric video using PNN, SVM, kNN and SVM+kNN classifiers. Cluster Comput 22(s5):10577–10586. https://doi.org/10.1007/s10586-017-1131-x

    Article  Google Scholar 

  74. Şengönül E, Samet R, Abu Al-Haija Q, Alqahtani A, Alturki B, Alsulami AA (2023) An analysis of artificial intelligence techniques in surveillance video anomaly detection: a comprehensive survey. Appl Sci (Switzerland) 13(8). https://doi.org/10.3390/app13084956

  75. Shammi S, Islam S, Rahman HA, Zaman HU (2019) An automated way of vehicle theft detection in parking facilities by identifying moving vehicles in CCTV video stream. Proceedings of the 2018 International Conference On Communication, Computing and Internet of Things, IC3IoT 2018, pp. 36–41. https://doi.org/10.1109/IC3IoT.2018.8668135

  76. Sheng B, Li P, Zhang Y, Mao L, Philip Chen CL (2021) GreenSea: visual soccer analysis using broad learning system. IEEE Trans Cybern 51(3):1463–1477. https://doi.org/10.1109/TCYB.2020.2988792

    Article  Google Scholar 

  77. Shingrakhia H, Patel H (2021) SGRNN-AM and HRF-DBN: a hybrid machine learning model for cricket video summarization. Visual Computer. https://doi.org/10.1007/s00371-021-02111-8

    Article  Google Scholar 

  78. Simonyan K, Zisserman A (2015) Very deep convolutional networks for large-scale image recognition. 3rd International Conference on Learning Representations, ICLR 2015 - Conference Track Proceedings, pp. 1–14

  79. Singh Parihar A, Pal J, Sharma I (2021) Multiview video summarization using video partitioning and clustering. J Vis Commun Image Represent 74(April 2020):102991. https://doi.org/10.1016/j.jvcir.2020.102991

    Article  Google Scholar 

  80. Singh T, Vishwakarma DK (2021) A deeply coupled ConvNet for human activity recognition using dynamic and RGB images. Neural Comput Appl 33(1):469–485. https://doi.org/10.1007/s00521-020-05018-y

    Article  Google Scholar 

  81. Sodemann AA, Ross MP, Borghetti BJ (2012) A review of anomaly detection in automated surveillance. IEEE Trans Syst Man Cybern Part C Appl Rev 42(6):1257–1272. https://doi.org/10.1109/TSMCC.2012.2215319

    Article  Google Scholar 

  82. Song X, Sun L, Lei J, Tao D, Yuan G, Song M (2016) Event-based large scale surveillance video summarization. Neurocomputing 187:66–74. https://doi.org/10.1016/j.neucom.2015.07.131

    Article  Google Scholar 

  83. Sreenu G, Saleem Durai MA (2019) Intelligent video surveillance: a review through deep learning techniques for crowd analysis. J Big Data 6(1):1–27. https://doi.org/10.1186/s40537-019-0212-5

    Article  Google Scholar 

  84. Street W (2016) Digital diagnostics an innovative medical technology. https://wall-street.com/digital-diagnostics-an-innovative-medical-technology/

  85. Sun S, Wang F, He L (2018) Movie summarization using bullet screen comments. Multimed Tools Appl 77(7):9093–9110. https://doi.org/10.1007/s11042-017-4807-6

    Article  Google Scholar 

  86. Szegedy C et al (2015) Going deeper with convolutions. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 07-12-June, pp. 1–9. https://doi.org/10.1109/CVPR.2015.7298594

  87. Tabish M, Tanooli Z ur R, Shaheen M (2021) Activity recognition framework in sports videos. Multimed Tools Appl. https://doi.org/10.1007/s11042-021-10519-6

  88. Takahashi N, Gygli M, Van Gool L (2018) AENet: Learning Deep Audio Features for Video Analysis. IEEE Trans Multimedia 20(3):513–524. https://doi.org/10.1109/TMM.2017.2751969

    Article  Google Scholar 

  89. Thomas SS, Gupta S, Subramanian VK (2017) Smart surveillance based on video summarization. TENSYMP 2017 - IEEE International Symposium on Technologies for Smart Cities. https://doi.org/10.1109/TENCONSpring.2017.8070003

  90. Tian Z, Xue J, Lan X, Li C, Zheng N (2014) Object segmentation and key-pose based summarization for motion video. Multimed Tools Appl 72(2):1773–1802. https://doi.org/10.1007/s11042-013-1488-7

    Article  Google Scholar 

  91. Tiwari V, Bhatnagar C (2021) A survey of recent work on video summarization: approaches and techniques. Multimed Tools Appl 80(18):27187–27221. https://doi.org/10.1007/s11042-021-10977-y

    Article  Google Scholar 

  92. Tripathi RK, Jalal AS, Agrawal SC (2018) Suspicious human activity recognition: a review. Artif Intell Rev 50(2):283–339. https://doi.org/10.1007/s10462-017-9545-7

    Article  Google Scholar 

  93. Uemura H, Ishikawa S, Mikolajczyk K (2008) Feature tracking and motion compensation for action recognition. BMVC 2008 - Proceedings of the British Machine Vision Conference 2008, no. January 2008. https://doi.org/10.5244/C.22.30

  94. Ullah A, Ahmad J, Muhammad K, Sajjad M, Baik SW (2017) Action Recognition in Video Sequences using Deep Bi-Directional LSTM with CNN Features. IEEE Access 6:1155–1166. https://doi.org/10.1109/ACCESS.2017.2778011

    Article  Google Scholar 

  95. Ullah Z, Al-turjman F, Mostarda L, Gagliardi R (2020) Applications of Artificial Intelligence and Machine learning in smart cities. Comput Commun 154(February):313–323. https://doi.org/10.1016/j.comcom.2020.02.069

    Article  Google Scholar 

  96. Verma KK, Singh BM, Dixit A (2019) A review of supervised and unsupervised machine learning techniques for suspicious behavior recognition in intelligent surveillance system. Int J Inf Technol (Singapore). https://doi.org/10.1007/s41870-019-00364-0

    Article  Google Scholar 

  97. Sen D, Raman B (2019) Video skimming: taxonomy and comprehensive survey. arXiv preprint arXiv:1909.12948

  98. Wang T, Chen J, Snoussi H (2013) Online detection of abnormal events in video streams. J Electr Comput Eng 2013:1–12. https://doi.org/10.1155/2013/837275

    Article  Google Scholar 

  99. Xiao Z, Jiang J, Ming Z (2019) High-level video event modeling, recognition, and reasoning via petri net. IEEE Access 7:129376–129386. https://doi.org/10.1109/ACCESS.2019.2936493

    Article  Google Scholar 

  100. Xu J, Sun Z, Ma C (2021) Crowd aware summarization of surveillance videos by deep reinforcement learning. Multimed Tools Appl 80(4):6121–6141. https://doi.org/10.1007/s11042-020-09888-1

    Article  Google Scholar 

  101. Xu L, Yan S, Chen X, Wang P (2019) Motion Recognition Algorithm Based on Deep Edge-Aware Pyramid Pooling Network in Human-Computer Interaction. IEEE Access 7:163806–163813. https://doi.org/10.1109/ACCESS.2019.2952432

    Article  Google Scholar 

  102. Yasmin G, Chowdhury S, Nayak J, Das P, Das AK (2021) Key moment extraction for designing an agglomerative clustering algorithm-based video summarization framework. Neural Comput Appl 1. https://doi.org/10.1007/s00521-021-06132-1

  103. Zahra A, Ghafoor M, Munir K, Ullah A, Ul Abideen Z (2021) Application of region-based video surveillance in smart cities using deep learning. Multimed Tools Appl (0123456789). https://doi.org/10.1007/s11042-021-11468-w

  104. Zeiler MD, Fergus R (2014) Visualizing and understanding convolutional networks. In: Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part I 13. Springer International Publishing, pp 818–833

  105. Zhang L, Gao D, Foh CH, Yang D, Gao S (2014) A survey of abnormal traffic information detection and transmission mechanisms in VSNs. Int J Distrib Sens Netw (2014). https://doi.org/10.1155/2014/582761

  106. Zhang J, Shum HPH, Han J, Shao L (2018) Action Recognition from Arbitrary Views Using Transferable Dictionary Learning. IEEE Trans Image Process 27(10):4709–4723. https://doi.org/10.1109/TIP.2018.2836323

    Article  MathSciNet  MATH  Google Scholar 

  107. Zhang Y, Zhang L, Zimmermann R (2014) Aesthetics-guided summarization from multiple user generated videos. ACM Trans Multimed Comput Commun Appl 11(2). https://doi.org/10.1145/2659520

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ambreen Sabha.

Ethics declarations

Conflict of interest

All the authors declare that they do not have any conflict of interest.

Additional information

Publisher's note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

Appendix

Table 8

Table 8 Some symbols and their description

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Sabha, A., Selwal, A. Towards machine vision-based video analysis in smart cities: a survey, framework, applications and open issues. Multimed Tools Appl (2023). https://doi.org/10.1007/s11042-023-16434-2

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s11042-023-16434-2

Keywords

Navigation