Human Detection in Surveillance Videos Based on Fine-Tuned MobileNetV2 for Effective Human Classification

Bouafia, Yassine; Guezouli, Larbi; Lakhlef, Hicham

doi:10.1007/s40998-022-00512-6

Human Detection in Surveillance Videos Based on Fine-Tuned MobileNetV2 for Effective Human Classification

Research Paper
Published: 18 July 2022

Volume 46, pages 971–988, (2022)
Cite this article

Iranian Journal of Science and Technology, Transactions of Electrical Engineering Aims and scope Submit manuscript

254 Accesses
1 Citation
Explore all metrics

Abstract

With the high rate of accidents and crimes around the world, the importance of video surveillance is growing every day and intelligent surveillance systems are being developed to perform surveillance tasks automatically. Detecting human beings accurately in a visual surveillance system is crucial for diverse application areas. The first step in the detection process is to detect moving objects. Then, the moving object could be classified either in the human class or in the non-human class. Human classification is an important process to build effective surveillance system. In this article, an efficient human detection algorithm is proposed by processing the regions of interest (ROI) based on a foreground estimation. In our proposal, we used MobileNetV2 deep convolution neural network, designed to be used in embedded devices, with transfer learning approach to build fine-tuned model for an efficient classification of ROI into human or not human. We train the fine-tuned model on INRIA person dataset using three scenarios. The resulting models were extensively evaluated on INRIA test dataset benchmark and they achieved an F-Score value of 98.35%, 98.72%, and 98.90% which we consider very satisfactory performance. The best fine-tuned model was used for the classification stage which achieved an accuracy of 98.42%, recall of 99.47%, precision of 98.34% and F-Score of 98.90%.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Human detection and tracking with deep convolutional neural networks under the constrained of noise and occluded scenes

Article 17 August 2020

Robust Person Tracking Algorithm Based on Convolutional Neural Network for Indoor Video Surveillance Systems

A real-time person tracking system based on SiamMask network for intelligent video surveillance

Article 28 July 2021

Notes

https://youtu.be/S8J7t1Efl00.

References

Beauchemin SS, Barron JL (1995) The computation of optical flow. ACM Comput Surv (CSUR) 27(3):433–466
Article Google Scholar
Benenson R, Mathias M, Tuytelaars T, Van Gool L (2013) Seeking the strongest rigid detector. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3666–3673
Caviar R (2020) CAVIAR Test case scenarios. https://homepages.inf.ed.ac.uk/rbf/CAVIARDATA1/. Accessed 12 May 2020
Chen M, Wei X, Yang Q, Li Q, Wang G, Yang MH (2017) Spatiotemporal gmm for background subtraction with superpixel hierarchy. IEEE Trans Pattern Anal Mach Intell 40(6):1518–1525
Article Google Scholar
Chollet F (2017) Deep learning with python. Greenwich, CT: Manning Publications CO 1
Cortes C, Vapnik V (1995) Support-vector networks. Mach Learn 20(3):273–297
Article MATH Google Scholar
Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. Comput Vis Pattern Recogn CVPR 2005 IEEE 1:886–893
Article Google Scholar
Dee HM, Velastin SA (2008) How close are we to solving the problem of automated visual surveillance? Mach Vis Appl 19(5–6):329–343
Article Google Scholar
Dollár P, Tu Z, Perona P, Belongie S (2009) Integral channel features. In: 2009 in Proceedings of the British Machine Vision Conference, BMVC Press, pp 7–10
Dollar P, Wojek C, Schiele B, Perona P (2011) Pedestrian detection: an evaluation of the state of the art. IEEE Trans Pattern Anal Mach Intell 34(4):743–761
Article Google Scholar
Dollár P, Appel R, Kienzle W (2012) Crosstalk cascades for frame-rate pedestrian detection. In: European Conference on Computer Vision, Springer, pp 645–659
Dollár P, Appel R, Belongie S, Perona P (2014) Fast feature pyramids for object detection. IEEE Trans Pattern Anal Machine Intell 36(8):1532–1545
Article Google Scholar
Elgammal A, Harwood D, Davis L (2000) Non-parametric model for background subtraction. In: European Conference on Computer Vision, Springer, pp 751–767
Felzenszwalb PF, Huttenlocher DP (2000a) Efficient matching of pictorial structures. In: Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No. PR00662), IEEE, 2 :66–73
Felzenszwalb PF, Huttenlocher DP (2000b) Efficient matching of pictorial structures. In: Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No. PR00662), IEEE, 2: 66–73
Felzenszwalb PF, Girshick RB, McAllester D, Ramanan D (2009) Object detection with discriminatively trained part-based models. IEEE Trans Pattern Anal Mach Intell 32(9):1627–1645
Article Google Scholar
Felzenszwalb PF, Girshick RB, McAllester D, Ramanan D (2009) Object detection with discriminatively trained part-based models. IEEE Trans Pattern Anal Mach Intell 32(9):1627–1645
Article Google Scholar
Garcia-Garcia B, Bouwmans T, Silva AJR (2020) Background subtraction in real applications: challenges, current models and future directions. Comput Sci Rev 35:100204
Article MathSciNet Google Scholar
Garcia-Martin A, Martinez JM (2010) Robust real time moving people detection in surveillance scenarios. In: 7th IEEE International Conference on Advanced Video and Signal Based Surveillance, IEEE, pp 241–247
Hampapur A, Brown L, Connell J, Pankanti S, Senior A, Tian Y (2003) Smart surveillance: applications, technologies and implications. In: Fourth International Conference on Information, Communications and Signal Processing, 2003 and the Fourth Pacific Rim Conference on Multimedia. Proceedings of the 2003 Joint, IEEE, 2: 1133–1138
Haritaoglu I, Harwood D, Davis LS (2000) W/sup 4: real-time surveillance of people and their activities. IEEE Trans Pattern Anal Mach Intell 22(8):809–830
Article Google Scholar
He K, Zhang X, Ren S, Sun J (2016a) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778
He K, Zhang X, Ren S, Sun J (2016b) Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 770–778
Hinton GE, Salakhutdinov RR (2006) Reducing the dimensionality of data with neural networks. Science 313(5786):504–507
Article MathSciNet MATH Google Scholar
Hossen MK, Tuli SH (2016) A surveillance system based on motion detection and motion estimation using optical flow. 5th International Conference on Informatics. Electronics and Vision (ICIEV), IEEE, pp 646–651
Howard AG, Zhu M, Chen B, Kalenichenko D, Wang W, Weyand T, Andreetto M, Adam H (2017a) Mobilenets: efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861
Howard AG, Zhu M, Chen B, Kalenichenko D, Wang W, Weyand T, Andreetto M, Adam H (2017b) Mobilenets: efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861
Javed S, Bouwmans T, Jung SK (2015) Depth extended online rpca with spatiotemporal constraints for robust background subtraction. In: 2015 21st Korea-Japan Joint Workshop on Frontiers of Computer Vision (FCV), IEEE, pp 1–6
Keras (2021) Keras applications. https://keras.io/api/applications/(2021). Accessed 02 September 2021
Khalifa AF, Badr E, Elmahdy HN (2019) A survey on human detection surveillance systems for raspberry pi. Image Vis Comput 85:1–13
Article Google Scholar
Ko T, Soatto S, Estrin D (2010) Warping background subtraction. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, IEEE, pp 1331–1338
Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. Adv Neural Inf Process Syst 25:1097–1105
Google Scholar
Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. Adv Neural Inf Process Syst 25:1097–1105
Google Scholar
Kurnianggoro L, Shahbaz A, Jo KH (2016) Dense optical flow in stabilized scenes for moving object detection from a moving camera. 2016 16th International Conference on Control. Automation and Systems (ICCAS), IEEE, pp 704–708
Levi K, Weiss Y (2004) Learning object detection from a small number of examples: the importance of good features. In: Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2004., IEEE, 2:II–II
Li X, Xu C (2015) Moving object detection in dynamic scenes based on optical flow and superpixels. In: 2015 IEEE International Conference on Robotics and Biomimetics (ROBIO), IEEE, pp 84–89
Liu T, Wang G (2009) A hierarchical approach for robust background subtraction based on two views. WRI Global Congr Intel Syst IEEE 4:325–329
Article Google Scholar
Liu X, Jin Z, Gao M (2012) A robust approach for multi-human detection and tracking. 2012 2nd International Conference on Consumer Electronics. Communications and Networks (CECNet), IEEE, pp 832–835
Lowe DG (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vis 60(2):91–110
Article Google Scholar
Marin J, Vázquez D, López AM, Amores J, Leibe B (2013) Random forests of local experts for pedestrian detection. In: Proceedings of the IEEE international conference on computer vision, pp 2592–2599
Mathias M, Benenson R, Timofte R, Van Gool L (2013) Handling occlusions with franken-classifiers. In: Proceedings of the IEEE International Conference on Computer Vision, pp 1505–1512
Mohan A, Papageorgiou C, Poggio T (2001) Example-based object detection in images by components. IEEE Trans Pattern Anal Mach Intell 23(4):349–361
Article Google Scholar
Mu Y, Yan S, Liu Y, Huang T, Zhou B (2008) Discriminative local binary patterns for human detection in personal album. In: IEEE Conference on Computer Vision and Pattern Recognition, IEEE, pp 1–8
Murali S, Girisha R (2009) Segmentation of motion objects from surveillance video sequences using temporal differencing combined with multiple correlation. In: Sixth IEEE International Conference on Advanced Video and Signal Based Surveillance, IEEE, pp 472–477
Narayana M, Hanson A, Learned-Miller E (2013) Coherent motion segmentation in moving camera videos using optical flow orientations. In: Proceedings of the IEEE International Conference on Computer Vision, pp 1577–1584
Nguyen DT, Li W, Ogunbona PO (2016) Human detection from images and videos: a survey. Pattern Recogn 51:148–175
Article Google Scholar
Noman M, Yousaf MH, Velastin SA (2016) An optimized and fast scheme for real-time human detection using raspberry pi. In: International Conference on Digital Image Computing: Techniques and Applications (DICTA), pp 1–7, 10.1109/DICTA.2016.7797008
Olugboja A, Wang Z (2016) Detection of moving objects using foreground detector and improved morphological filter. In: 3rd International Conference on Information Science and Control Engineering (ICISCE), IEEE, pp 329–333
Papazoglou A, Ferrari V (2013) Fast object segmentation in unconstrained video. In: Proceedings of the IEEE International Conference on Computer Vision, pp 1777–1784
Park D, Ramanan D, Fowlkes C (2010) Multiresolution models for object detection. In: European conference on computer vision, Springer, pp 241–254
Patel PB, Choksi VM, Jadhav S, Potdar M (2016) Smart motion detection system using raspberry pi. Int J Appl Inf Syst (IJAIS) , pp 2249–0868
Ren J, Jiang X, Yuan J (2013) Relaxed local ternary pattern for face recognition. In: IEEE International Conference on Image Processing, IEEE, pp 3680–3684
Ronfard R, Schmid C, Triggs B (2002) Learning to parse pictures of people. In: European Conference on Computer Vision, Springer, pp 700–714
Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, Huang Z, Karpathy A, Khosla A, Bernstein M (2015) Imagenet large scale visual recognition challenge. Int J Comput Vis 115(3):211–252
Article MathSciNet Google Scholar
Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, Huang Z, Karpathy A, Khosla A, Bernstein M et al (2015) Imagenet large scale visual recognition challenge. Int J Comput Vis 115(3):211–252
Article MathSciNet Google Scholar
Sabzmeydani P, Mori G (2007) Detecting pedestrians by learning shapelet features. In: IEEE Conference on Computer Vision and Pattern Recognition, IEEE, pp 1–8
Sandler M, Howard A, Zhu M, Zhmoginov A, Chen LC (2018) Mobilenetv2: Inverted residuals and linear bottlenecks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4510–4520
Shalev-Shwartz S, Ben-David S (2014) Understanding machine learning: From theory to algorithms. Cambridge University Press, Cambridge
Book MATH Google Scholar
Simonyan K, Zisserman A (2014a) Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556
Simonyan K, Zisserman A (2014b) Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556
Suzuki S et al (1985) Topological structural analysis of digitized binary images by border following. Comput Vis Graph Image Process 30(1):32–46
Article MATH Google Scholar
Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2015a) Going deeper with convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 1–9
Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2015b) Going deeper with convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 1–9
Szegedy C, Vanhoucke V, Ioffe S, Shlens J, Wojna Z (2016) Rethinking the inception architecture for computer vision. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 2818–2826
Theodoridis S, Koutroumbas K (2009) Chapter 13 - clustering algorithms II: hierarchical algorithms. Academic Press, Boston, pp 653–700
Google Scholar
Viola P, Jones M (2001) Rapid object detection using a boosted cascade of simple features. CVPR 1(511–518):3
Google Scholar
Viola P, Jones MJ, Snow D (2005) Detecting pedestrians using patterns of motion and appearance. Int J Comput Vis 63(2):153–161
Article Google Scholar
Walk S, Majer N, Schindler K, Schiele B (2010) New features and insights for pedestrian detection. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, IEEE, pp 1030–1037
Wang X, Han TX, Yan S (2009) An hog-lbp human detector with partial occlusion handling. In: IEEE 12th International Conference on Computer Vision, IEEE, pp 32–39
Wojek C, Schiele B (2008) A performance evaluation of single and multi-feature people detection. In: Joint Pattern Recognition Symposium, Springer, pp 82–91
Wojek C, Schiele B, Perona P (2009a) Pedestrian detection: a benchmark. in in computer vision and pattern recognition, 2009. cvpr 2009. In: IEEE Conference on Citeseer
Wojek C, Walk S, Schiele B (2009b) Multi-cue onboard pedestrian detection. In: IEEE Conference on Computer Vision and Pattern Recognition, IEEE, pp 794–801
Wu B, Nevatia R (2005) Detection of multiple, partially occluded humans in a single image by bayesian combination of edgelet part detectors. In: Tenth IEEE International Conference on Computer Vision (ICCV’05), IEEE, 1:90–97
Wu B, Nevatia R (2008) Optimizing discrimination-efficiency tradeoff in integrating heterogeneous local features for object detection. In: 2008 IEEE Conference on Computer Vision and Pattern Recognition, IEEE, pp 1–8
Yan J, Zhang X, Lei Z, Liao S, Li SZ (2013) Robust multi-resolution pedestrian detection in traffic scenes. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 3033–3040
Zhang Y, Li G, Xie X, Wang Z (2017) A new algorithm for fast and accurate moving object detection based on motion segmentation by clustering. In: Fifteenth IAPR International Conference on Machine Vision Applications (MVA), IEEE, pp 444–447
Zhang Y, Zhu D, Bi H, Zhang G, Leung H (2019) Scattering key-frame extraction for comprehensive videosar summarization: a spatiotemporal background subtraction perspective. IEEE Trans Instrum Meas 69(7):4768–4784
Article Google Scholar
Zivkovic Z, Van Der Heijden F (2006) Efficient adaptive density estimation per image pixel for the task of background subtraction. Pattern Recogn Lett 27(7):773–780
Article Google Scholar

Download references

Author information

Authors and Affiliations

Lastic Laboratory, University of Batna2, Batna, Algeria
Yassine Bouafia & Larbi Guezouli
Sorbonne University, University of Technology of Compiègne, Compiègne, France
Hicham Lakhlef

Authors

Yassine Bouafia
View author publications
You can also search for this author in PubMed Google Scholar
Larbi Guezouli
View author publications
You can also search for this author in PubMed Google Scholar
Hicham Lakhlef
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yassine Bouafia.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Bouafia, Y., Guezouli, L. & Lakhlef, H. Human Detection in Surveillance Videos Based on Fine-Tuned MobileNetV2 for Effective Human Classification. Iran J Sci Technol Trans Electr Eng 46, 971–988 (2022). https://doi.org/10.1007/s40998-022-00512-6

Download citation

Received: 26 December 2020
Accepted: 04 June 2022
Published: 18 July 2022
Issue Date: December 2022
DOI: https://doi.org/10.1007/s40998-022-00512-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Human Detection in Surveillance Videos Based on Fine-Tuned MobileNetV2 for Effective Human Classification

Abstract

Access this article

Similar content being viewed by others

Human detection and tracking with deep convolutional neural networks under the constrained of noise and occluded scenes

Robust Person Tracking Algorithm Based on Convolutional Neural Network for Indoor Video Surveillance Systems

A real-time person tracking system based on SiamMask network for intelligent video surveillance

Notes

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Human Detection in Surveillance Videos Based on Fine-Tuned MobileNetV2 for Effective Human Classification

Abstract

Access this article

Similar content being viewed by others

Human detection and tracking with deep convolutional neural networks under the constrained of noise and occluded scenes

Robust Person Tracking Algorithm Based on Convolutional Neural Network for Indoor Video Surveillance Systems

A real-time person tracking system based on SiamMask network for intelligent video surveillance

Notes

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation