Skip to main content

Advertisement

Log in

Human Detection in Surveillance Videos Based on Fine-Tuned MobileNetV2 for Effective Human Classification

  • Research Paper
  • Published:
Iranian Journal of Science and Technology, Transactions of Electrical Engineering Aims and scope Submit manuscript

Abstract

With the high rate of accidents and crimes around the world, the importance of video surveillance is growing every day and intelligent surveillance systems are being developed to perform surveillance tasks automatically. Detecting human beings accurately in a visual surveillance system is crucial for diverse application areas. The first step in the detection process is to detect moving objects. Then, the moving object could be classified either in the human class or in the non-human class. Human classification is an important process to build effective surveillance system. In this article, an efficient human detection algorithm is proposed by processing the regions of interest (ROI) based on a foreground estimation. In our proposal, we used MobileNetV2 deep convolution neural network, designed to be used in embedded devices, with transfer learning approach to build fine-tuned model for an efficient classification of ROI into human or not human. We train the fine-tuned model on INRIA person dataset using three scenarios. The resulting models were extensively evaluated on INRIA test dataset benchmark and they achieved an F-Score value of 98.35%, 98.72%, and 98.90% which we consider very satisfactory performance. The best fine-tuned model was used for the classification stage which achieved an accuracy of 98.42%, recall of 99.47%, precision of 98.34% and F-Score of 98.90%.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16

Similar content being viewed by others

Notes

  1. https://youtu.be/S8J7t1Efl00.

References

  • Beauchemin SS, Barron JL (1995) The computation of optical flow. ACM Comput Surv (CSUR) 27(3):433–466

    Article  Google Scholar 

  • Benenson R, Mathias M, Tuytelaars T, Van Gool L (2013) Seeking the strongest rigid detector. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3666–3673

  • Caviar R (2020) CAVIAR Test case scenarios. https://homepages.inf.ed.ac.uk/rbf/CAVIARDATA1/. Accessed 12 May 2020

  • Chen M, Wei X, Yang Q, Li Q, Wang G, Yang MH (2017) Spatiotemporal gmm for background subtraction with superpixel hierarchy. IEEE Trans Pattern Anal Mach Intell 40(6):1518–1525

    Article  Google Scholar 

  • Chollet F (2017) Deep learning with python. Greenwich, CT: Manning Publications CO 1

  • Cortes C, Vapnik V (1995) Support-vector networks. Mach Learn 20(3):273–297

    Article  MATH  Google Scholar 

  • Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. Comput Vis Pattern Recogn CVPR 2005 IEEE 1:886–893

    Article  Google Scholar 

  • Dee HM, Velastin SA (2008) How close are we to solving the problem of automated visual surveillance? Mach Vis Appl 19(5–6):329–343

    Article  Google Scholar 

  • Dollár P, Tu Z, Perona P, Belongie S (2009) Integral channel features. In: 2009 in Proceedings of the British Machine Vision Conference, BMVC Press, pp 7–10

  • Dollar P, Wojek C, Schiele B, Perona P (2011) Pedestrian detection: an evaluation of the state of the art. IEEE Trans Pattern Anal Mach Intell 34(4):743–761

    Article  Google Scholar 

  • Dollár P, Appel R, Kienzle W (2012) Crosstalk cascades for frame-rate pedestrian detection. In: European Conference on Computer Vision, Springer, pp 645–659

  • Dollár P, Appel R, Belongie S, Perona P (2014) Fast feature pyramids for object detection. IEEE Trans Pattern Anal Machine Intell 36(8):1532–1545

    Article  Google Scholar 

  • Elgammal A, Harwood D, Davis L (2000) Non-parametric model for background subtraction. In: European Conference on Computer Vision, Springer, pp 751–767

  • Felzenszwalb PF, Huttenlocher DP (2000a) Efficient matching of pictorial structures. In: Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No. PR00662), IEEE, 2 :66–73

  • Felzenszwalb PF, Huttenlocher DP (2000b) Efficient matching of pictorial structures. In: Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No. PR00662), IEEE, 2: 66–73

  • Felzenszwalb PF, Girshick RB, McAllester D, Ramanan D (2009) Object detection with discriminatively trained part-based models. IEEE Trans Pattern Anal Mach Intell 32(9):1627–1645

    Article  Google Scholar 

  • Felzenszwalb PF, Girshick RB, McAllester D, Ramanan D (2009) Object detection with discriminatively trained part-based models. IEEE Trans Pattern Anal Mach Intell 32(9):1627–1645

    Article  Google Scholar 

  • Garcia-Garcia B, Bouwmans T, Silva AJR (2020) Background subtraction in real applications: challenges, current models and future directions. Comput Sci Rev 35:100204

    Article  MathSciNet  Google Scholar 

  • Garcia-Martin A, Martinez JM (2010) Robust real time moving people detection in surveillance scenarios. In: 7th IEEE International Conference on Advanced Video and Signal Based Surveillance, IEEE, pp 241–247

  • Hampapur A, Brown L, Connell J, Pankanti S, Senior A, Tian Y (2003) Smart surveillance: applications, technologies and implications. In: Fourth International Conference on Information, Communications and Signal Processing, 2003 and the Fourth Pacific Rim Conference on Multimedia. Proceedings of the 2003 Joint, IEEE, 2: 1133–1138

  • Haritaoglu I, Harwood D, Davis LS (2000) W/sup 4: real-time surveillance of people and their activities. IEEE Trans Pattern Anal Mach Intell 22(8):809–830

    Article  Google Scholar 

  • He K, Zhang X, Ren S, Sun J (2016a) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778

  • He K, Zhang X, Ren S, Sun J (2016b) Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 770–778

  • Hinton GE, Salakhutdinov RR (2006) Reducing the dimensionality of data with neural networks. Science 313(5786):504–507

    Article  MathSciNet  MATH  Google Scholar 

  • Hossen MK, Tuli SH (2016) A surveillance system based on motion detection and motion estimation using optical flow. 5th International Conference on Informatics. Electronics and Vision (ICIEV), IEEE, pp 646–651

  • Howard AG, Zhu M, Chen B, Kalenichenko D, Wang W, Weyand T, Andreetto M, Adam H (2017a) Mobilenets: efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861

  • Howard AG, Zhu M, Chen B, Kalenichenko D, Wang W, Weyand T, Andreetto M, Adam H (2017b) Mobilenets: efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861

  • Javed S, Bouwmans T, Jung SK (2015) Depth extended online rpca with spatiotemporal constraints for robust background subtraction. In: 2015 21st Korea-Japan Joint Workshop on Frontiers of Computer Vision (FCV), IEEE, pp 1–6

  • Keras (2021) Keras applications. https://keras.io/api/applications/(2021). Accessed 02 September 2021

  • Khalifa AF, Badr E, Elmahdy HN (2019) A survey on human detection surveillance systems for raspberry pi. Image Vis Comput 85:1–13

    Article  Google Scholar 

  • Ko T, Soatto S, Estrin D (2010) Warping background subtraction. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, IEEE, pp 1331–1338

  • Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. Adv Neural Inf Process Syst 25:1097–1105

    Google Scholar 

  • Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. Adv Neural Inf Process Syst 25:1097–1105

    Google Scholar 

  • Kurnianggoro L, Shahbaz A, Jo KH (2016) Dense optical flow in stabilized scenes for moving object detection from a moving camera. 2016 16th International Conference on Control. Automation and Systems (ICCAS), IEEE, pp 704–708

  • Levi K, Weiss Y (2004) Learning object detection from a small number of examples: the importance of good features. In: Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2004., IEEE, 2:II–II

  • Li X, Xu C (2015) Moving object detection in dynamic scenes based on optical flow and superpixels. In: 2015 IEEE International Conference on Robotics and Biomimetics (ROBIO), IEEE, pp 84–89

  • Liu T, Wang G (2009) A hierarchical approach for robust background subtraction based on two views. WRI Global Congr Intel Syst IEEE 4:325–329

    Article  Google Scholar 

  • Liu X, Jin Z, Gao M (2012) A robust approach for multi-human detection and tracking. 2012 2nd International Conference on Consumer Electronics. Communications and Networks (CECNet), IEEE, pp 832–835

  • Lowe DG (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vis 60(2):91–110

    Article  Google Scholar 

  • Marin J, Vázquez D, López AM, Amores J, Leibe B (2013) Random forests of local experts for pedestrian detection. In: Proceedings of the IEEE international conference on computer vision, pp 2592–2599

  • Mathias M, Benenson R, Timofte R, Van Gool L (2013) Handling occlusions with franken-classifiers. In: Proceedings of the IEEE International Conference on Computer Vision, pp 1505–1512

  • Mohan A, Papageorgiou C, Poggio T (2001) Example-based object detection in images by components. IEEE Trans Pattern Anal Mach Intell 23(4):349–361

    Article  Google Scholar 

  • Mu Y, Yan S, Liu Y, Huang T, Zhou B (2008) Discriminative local binary patterns for human detection in personal album. In: IEEE Conference on Computer Vision and Pattern Recognition, IEEE, pp 1–8

  • Murali S, Girisha R (2009) Segmentation of motion objects from surveillance video sequences using temporal differencing combined with multiple correlation. In: Sixth IEEE International Conference on Advanced Video and Signal Based Surveillance, IEEE, pp 472–477

  • Narayana M, Hanson A, Learned-Miller E (2013) Coherent motion segmentation in moving camera videos using optical flow orientations. In: Proceedings of the IEEE International Conference on Computer Vision, pp 1577–1584

  • Nguyen DT, Li W, Ogunbona PO (2016) Human detection from images and videos: a survey. Pattern Recogn 51:148–175

    Article  Google Scholar 

  • Noman M, Yousaf MH, Velastin SA (2016) An optimized and fast scheme for real-time human detection using raspberry pi. In: International Conference on Digital Image Computing: Techniques and Applications (DICTA), pp 1–7, 10.1109/DICTA.2016.7797008

  • Olugboja A, Wang Z (2016) Detection of moving objects using foreground detector and improved morphological filter. In: 3rd International Conference on Information Science and Control Engineering (ICISCE), IEEE, pp 329–333

  • Papazoglou A, Ferrari V (2013) Fast object segmentation in unconstrained video. In: Proceedings of the IEEE International Conference on Computer Vision, pp 1777–1784

  • Park D, Ramanan D, Fowlkes C (2010) Multiresolution models for object detection. In: European conference on computer vision, Springer, pp 241–254

  • Patel PB, Choksi VM, Jadhav S, Potdar M (2016) Smart motion detection system using raspberry pi. Int J Appl Inf Syst (IJAIS) , pp 2249–0868

  • Ren J, Jiang X, Yuan J (2013) Relaxed local ternary pattern for face recognition. In: IEEE International Conference on Image Processing, IEEE, pp 3680–3684

  • Ronfard R, Schmid C, Triggs B (2002) Learning to parse pictures of people. In: European Conference on Computer Vision, Springer, pp 700–714

  • Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, Huang Z, Karpathy A, Khosla A, Bernstein M (2015) Imagenet large scale visual recognition challenge. Int J Comput Vis 115(3):211–252

    Article  MathSciNet  Google Scholar 

  • Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, Huang Z, Karpathy A, Khosla A, Bernstein M et al (2015) Imagenet large scale visual recognition challenge. Int J Comput Vis 115(3):211–252

    Article  MathSciNet  Google Scholar 

  • Sabzmeydani P, Mori G (2007) Detecting pedestrians by learning shapelet features. In: IEEE Conference on Computer Vision and Pattern Recognition, IEEE, pp 1–8

  • Sandler M, Howard A, Zhu M, Zhmoginov A, Chen LC (2018) Mobilenetv2: Inverted residuals and linear bottlenecks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4510–4520

  • Shalev-Shwartz S, Ben-David S (2014) Understanding machine learning: From theory to algorithms. Cambridge University Press, Cambridge

    Book  MATH  Google Scholar 

  • Simonyan K, Zisserman A (2014a) Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556

  • Simonyan K, Zisserman A (2014b) Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556

  • Suzuki S et al (1985) Topological structural analysis of digitized binary images by border following. Comput Vis Graph Image Process 30(1):32–46

    Article  MATH  Google Scholar 

  • Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2015a) Going deeper with convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 1–9

  • Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2015b) Going deeper with convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 1–9

  • Szegedy C, Vanhoucke V, Ioffe S, Shlens J, Wojna Z (2016) Rethinking the inception architecture for computer vision. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 2818–2826

  • Theodoridis S, Koutroumbas K (2009) Chapter 13 - clustering algorithms II: hierarchical algorithms. Academic Press, Boston, pp 653–700

    Google Scholar 

  • Viola P, Jones M (2001) Rapid object detection using a boosted cascade of simple features. CVPR 1(511–518):3

    Google Scholar 

  • Viola P, Jones MJ, Snow D (2005) Detecting pedestrians using patterns of motion and appearance. Int J Comput Vis 63(2):153–161

    Article  Google Scholar 

  • Walk S, Majer N, Schindler K, Schiele B (2010) New features and insights for pedestrian detection. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, IEEE, pp 1030–1037

  • Wang X, Han TX, Yan S (2009) An hog-lbp human detector with partial occlusion handling. In: IEEE 12th International Conference on Computer Vision, IEEE, pp 32–39

  • Wojek C, Schiele B (2008) A performance evaluation of single and multi-feature people detection. In: Joint Pattern Recognition Symposium, Springer, pp 82–91

  • Wojek C, Schiele B, Perona P (2009a) Pedestrian detection: a benchmark. in in computer vision and pattern recognition, 2009. cvpr 2009. In: IEEE Conference on Citeseer

  • Wojek C, Walk S, Schiele B (2009b) Multi-cue onboard pedestrian detection. In: IEEE Conference on Computer Vision and Pattern Recognition, IEEE, pp 794–801

  • Wu B, Nevatia R (2005) Detection of multiple, partially occluded humans in a single image by bayesian combination of edgelet part detectors. In: Tenth IEEE International Conference on Computer Vision (ICCV’05), IEEE, 1:90–97

  • Wu B, Nevatia R (2008) Optimizing discrimination-efficiency tradeoff in integrating heterogeneous local features for object detection. In: 2008 IEEE Conference on Computer Vision and Pattern Recognition, IEEE, pp 1–8

  • Yan J, Zhang X, Lei Z, Liao S, Li SZ (2013) Robust multi-resolution pedestrian detection in traffic scenes. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 3033–3040

  • Zhang Y, Li G, Xie X, Wang Z (2017) A new algorithm for fast and accurate moving object detection based on motion segmentation by clustering. In: Fifteenth IAPR International Conference on Machine Vision Applications (MVA), IEEE, pp 444–447

  • Zhang Y, Zhu D, Bi H, Zhang G, Leung H (2019) Scattering key-frame extraction for comprehensive videosar summarization: a spatiotemporal background subtraction perspective. IEEE Trans Instrum Meas 69(7):4768–4784

    Article  Google Scholar 

  • Zivkovic Z, Van Der Heijden F (2006) Efficient adaptive density estimation per image pixel for the task of background subtraction. Pattern Recogn Lett 27(7):773–780

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yassine Bouafia.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Bouafia, Y., Guezouli, L. & Lakhlef, H. Human Detection in Surveillance Videos Based on Fine-Tuned MobileNetV2 for Effective Human Classification. Iran J Sci Technol Trans Electr Eng 46, 971–988 (2022). https://doi.org/10.1007/s40998-022-00512-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s40998-022-00512-6

Keywords

Navigation