Skip to main content
Log in

KeyFrame extraction based on face quality measurement and convolutional neural network for efficient face recognition in videos

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

This article has been updated

Abstract

Indexing is the process of extracting a compact, significant and pertinent signature that describes the content of the data. This field has a broad spectrum of promising applications, such as the Face in Video Recognition (FiVR). Motivating the interest of researchers around the world. Since the video has a huge amount of data, the process of extracting the relevant frames becomes necessary and an essential step prior to performing face recognition. In this context, we propose a new method for extracting keyframes from videos based on face quality and deep learning for a face recognition task. The first step is the face detection using MTCNN detector, which detects five landmarks (the eyes, the two corners of the mouth and the nose). It limits face boundaries in a bounding box, and provides a confidence score. This method has two steps. The first step aims to generate the face quality score of each face in the data set prepared for the learning step. To generate quality scores, we use three face feature extractor including Gabor, LBP and HoG. The second step consist on training a deep Convolutional Neural Network in a supervised manner in order to select frames having the best face quality. The obtained results show the effectiveness of the proposed method compared to the methods of the state of the art.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

Change history

  • 27 October 2020

    The original version of this paper was updated to present the correct biography of the first and corresponding author and to present the missing photos and biographies of the second and third authors.

References

  1. Adam F, Robert L (2007) Constructing face image logs that are both complete and concise, 4th Canadian Conference on Computer and Robot Vision (CRV’07) : 488–494

  2. Ahonen T, Hadid A, Pietikainen M (2004) Face recognition with local binary patterns. Eur Conf Comput Vision (ECCV) 3021:469–481

    MATH  Google Scholar 

  3. Akram A, Wang N, Li J, Gao X (2018) A comparative study on face sketch synthesis. IEEE Access 6:37084–37093

    Article  Google Scholar 

  4. Anantharajah K, Denman S, Tjondronegoro SD, Fookes C, Guo X (2013) Quality based frame selection for face clustering in news video, International Conference on Digital Image Computing: Techniques and Applications (DICTA) :1–8

  5. At&t laboratories cambridge face database. URL http://www.cl.cam.ac.uk/research/dtg/attarchive/ facedatabase.html. Accessed 26 March 2019.

  6. Athanasios V, Nikolaos D, Anastasios D, Eftychios P (2014) Deep learning for computer vision: a brief review, Computational intelligence and neuroscience

  7. Barr PJR, Bowyer KW, Biswas S (2012) Face recognition from video: a review. Int J Pattern Recognit Artif Intell 26(5):1266002

    Article  MathSciNet  Google Scholar 

  8. Bi H, Li N, Guan H, Lu D, Yang L (2019) A multi-scale conditional generative adversarial network for face sketch synthesis, IEEE International Conference on Image Processing (ICIP): 3876–3880

  9. H. Bi, N. Li, H. Guan, D. Lu, L. Yang, (2019) A multi-scale conditional generative adversarial network for face sketch synthesis, in: 2019 IEEE international conference on image processing (ICIP): 3876–3880.

  10. Bunyak F, Ersoy I, Subramanya S (2005) A multi-hypothesis approach for salient object tracking in visual surveillance, in: IEEE International Conference on Image Processing

  11. Cament LA, Galdames F, Bowyer K, Perez C (2015) Face recognition under pose variation with local gabor features enhanced by active shape and statistical models. Pattern Recogn 48(11):3371–3384

    Article  Google Scholar 

  12. Carcagnì P, Coco MD, Leo M, Distante C (2015) Facial expression recognition and histograms of oriented gradients: a comprehensive study. SpringerPlus 4(1)

  13. Chen J, Deng Y, Bai G, Su G (2015) Face image quality assessment based on learning to rank. Signal Process Lett IEEE 22(1):90–94

    Article  Google Scholar 

  14. Chen Y, Hu R, Xiao J, Liao L, Xiao J, Zhan G (2016) Criminal investigation oriented saliency detection for surveillance videos, in: Pacific Rim Conference on Multimedia, Springer: 487–496

  15. Clevert D-A, Unterthiner Th., Hochreiter S (2016) Fast and accurate deep network learning by exponential linear units (elus), International Conference on Learning Representations (ICLR)

  16. Dalal N, Trigg B (2005) Histograms of oriented gradients for human detection, IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR): 886–893

  17. Deng W, Chen B, Fang Y, Hu J (2017) Deep correlation feature learning for face verification in the wild. IEEE Signal Process Lett 24(2):1877–1881

    Article  Google Scholar 

  18. Dhamecha TI, Goswami G, Singh R, Vatsa M (2016) On frame selection for video face recognition. Advances Face Detect Fac Image Analysis:279–297

  19. Dubey AK, Jain V (2019) A review of face recognition methods using deep learning network. J Inf Optim Sci 40(2):547–558

    Google Scholar 

  20. Face recognition data, university of essex, uk. URL https://cswww.essex.ac.uk/mv/allfaces/index.html Accessed 28 March 2019

  21. D.-P. Fan, W. Wang, M.-M. Cheng, J. Shen, (2019) Shifting more attention to video salient object detection proceedings of the IEEE conference on computer vision and pattern recognition: 8554–8564.

  22. Fu T-C, Chiu W-C, Wang Y-CF (2017) Learning guided convolutional neural networks for cross- resolution face recognition, IEEE 27th International Workshop on Machine Learning for Signal Processing (MLSP): 1–5

  23. Gharbi H, Bahroun S, Massaoudi M, Zagrouba E (2017) Key frames extraction using graph modularity clus- tering for efficient video summarization. IEEE Int Conf Acoustics Speech Signal Process ICASSP 42:1502–1506

    Google Scholar 

  24. Guangle Y, Tao L, Zhong J (2019) A review of convolutional neural network-based action recognition. Pattern Recogn Lett 118:14–22

    Article  Google Scholar 

  25. J. Gui, Z. Sun, Y. Wen, D. Tao, J. Ye (2020) A review on generative adversarial networks: Algorithms, theory, and applications, arXiv preprint arXiv:2001.06937

  26. Guo G, Zhang N (2019) A survey on deep learning based face recognition. Comput Vis Image Underst 189:102805

    Article  Google Scholar 

  27. Guraya FFE, Cheikh FA, Tremeau A, Tong Y, Konik H (2010) Predictive saliency maps for surveillance videos, Ninth International Symposium on Distributed Computing and Applications to Business, Engineering and Science, IEEE: 508–513

  28. He K, Zhang X, Ren Sh., Sun J (2015) Delving deep into rectifiers: Surpassing human-level performance on imagenet classification, Proceedings of the IEEE International Conference on Computer Vision : 1026–1034

  29. Huang C, Wang H (2019) Novel key-frames selection framework for comprehensive video summarization, IEEE Trans Circ Syst Vid Technol

  30. Huang GB, Marwan M, Tamara B, Eric L-M (2008) Labeled faces in the wild: A database for studying face recognition in unconstrained environments, Workshop on faces in ‘Real-Life’ Images: detection, alignment, and recognition

  31. Huang D, Shan C, Ardabilian M, Wang Y, Chen L (2011) Local binary patterns and its application to facial image analysis: a survey, IEEE transactions on systems, man, and cybernetics. Part C (Applications and Reviews) 41(6):765–781

    Google Scholar 

  32. Huang R, Liu C, Li G, Zhou J (2016) Adaptive deep supervised autoencoder based image reconstruction for face recognition. Math Probl Eng 2016:1–14

    Google Scholar 

  33. Javed S, Mahmood A, Bouwmans T, Jung SK (2017) Superpixels-based manifold structured sparse rpca for moving object detection, In: Proceedings of the British Machine Vision Conference (BMVC 2017), London, UK: 4–7

  34. Javier H-O, Javier G, Julian F, Rudolf H, Laurent B (2019) FaceQNET: quality assessment for face recog- nition based on deep learning, arXiv preprint arXiv:1904.01740

  35. Jian M, Zhang S, Wu L, Zhang S, Wang X, He Y (2019) Deep key frame extraction for sport training. Neurocomputing 328:147–156

    Article  Google Scholar 

  36. Jiao L, Zhang F, Liu F, Yang S, Li L, Feng Z, Qu R (2019) A survey of deep learning-based object detection: IEEE Access (7): 128837–128868

  37. Kaavya S, LakshmiPriya GG (2015) Multimedia indexing and retrieval: Recent research work and their challenges, 3rd International Conference on Signal Processing, Communication and Networking (ICSCN): 1–5

  38. Karpathy A, Fei-Fei L (2015) Deep visual-semantic alignments for generating image descriptions, Proceedings of the IEEE conference on computer vision and pattern recognition: 3128–3137

  39. Kingma DP, Ba J (2015) Adam: a method for stochastic optimization, International Conference on Learning Representations (ICLR) .

  40. Kini M, Pai K (2019) A survey on video summarization techniques. Innovat Power Adv Comput Technol (i-PACT) 1:1–5

    Google Scholar 

  41. Krizhevsky A (2009) Learning multiple layers of features from tiny images, technical report, University of Toronto 1 (4)

  42. Krizhevsky A, Hinton GE (2011) Using very deep autoencoders for content-based image retrieval ESANN, Vol. 1, Citeseer, p. 2

  43. Lacey B-R, Jain AK (2018) Learning face image quality from human assessments. IEEE Trans Inform Foren Sec 13(12):3064–3077

    Article  Google Scholar 

  44. LeCun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. Proc IEEE 86(11):2278–2324

    Article  Google Scholar 

  45. Lee JHK-C, Kriegman DJ (2005) Acquiring linear subspaces for face recognition under variable lighting. IEEE Trans Pattern Anal Mach Intell 27(5):684–698

    Article  Google Scholar 

  46. Li P, Wang D, Wang L, Lu H (2018) Deep visual tracking: review and experimental comparison. Pattern Recogn 76:323–338

    Article  Google Scholar 

  47. Liu C, Wechsler H (2002) Gabor feature based classification using the enhanced fisher linear discriminant model for face recognition. IEEE Trans Image Process 11(4):467–476

    Article  Google Scholar 

  48. Liu Y, Wei F, Shao J, Sheng L, Yan J, Wang X (2018) Exploring disentangled feature representation beyond face identification, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, : 2080–2089

  49. Matteo F, Annalisa F, Dario M, Davide M (2012) Face image conformance to iso/icao standards in machine readable travel documents. IEEE Trans Inform Foren Sec 7(4):1204–1213

    Article  Google Scholar 

  50. Mei W, Weihong D (2018) Deep face recognition: a survey, ArXiv preprint arXiv:1804.06655 (26)

  51. Mejda C, Akram K, Wajdi B, Chokri BA (2016) A survey of 2d face recognition techniques. Computers 5(4)

  52. Muhammad K, Hussain T, Baik SW (2018) Efficient CNN based summarization of surveillance videos for resource-constrained devices, Pattern Recogn Lett

  53. Nair V, Hinton GE (2010) Rectified linear units improve restricted boltzmann machines, Proceedings of the International Conference on Machine Learning (ICML) : 807–814

  54. Nasrollahi K, Moeslund TB (2008) Face quality assessment system in video sequences. Biomet Ident Manag Springer:10–18

  55. Nasrollahi K, Moeslund TB (2011) Summarization of surveillance video sequences using face quality assessment. Int J Image Graph 11(2):207–233

    Article  Google Scholar 

  56. M. Nikitin, V. Konushin, A. Konushin (2014) Face quality assessment for face verification in video: 111–114.

  57. Pan L, Shu X, Zhang M (2015) A key frame extraction algorithm based on clustering and compressive sensing. Int J Multimed Ubiquitous Eng 10(11):385–396

    Article  Google Scholar 

  58. Parkhi OM, Vedaldi A, Zisserman A (2015) Deep face recognition. British Mach Vision Conf (BMVC) 1(3):1–12

    Google Scholar 

  59. Patiland PU, Warhade K (2016) Analysis of various keyframe extraction methods. Int J Electric Electron Res 4(2):35–40

    Google Scholar 

  60. Podlesnaya A, Podlesnyy S (2016) Deep learning based semantic video indexing and retrieval. Proceedings of SAI Intelligent Systems Conference, Springer: 359–372

  61. Qi X, Liu Ch. (2015) GPU-accelerated key frame analysis for face detection in video, IEEE workshop on Delay Sensitive Video Computing in the Cloud (DSVCC) : 600–605

  62. Qi CX, Schuckers S (2018) Boosting face in video recognition via CNN based key frame extraction, international Conference of Biometrics (ICB): 132–139

  63. Qi X, Liu C, Schuckers S (2018) CNN based key frame extraction for face in video recognition, IEEE 4th International Conference on Identity, Security, and Behavior Analysis (ISBA): 1–8

  64. Qiong C, Li S, Weidi X, Parkhi OM, Zisserman A (2018) Vggface2: A dataset for recognizing faces across pose and age, 13th IEEE International Conference on Automatic Face & Gesture Recognition (FG): 67–74

  65. Ramachandran P, Zoph B, Le QV (2018) Searching for activation functions, International Conference on Learning Representations ICLR

  66. Schroff F, Kalenichenko D, Philbin J (2015) Facenet: A unified embedding for face recognition and clustering, IEEE Conference on Computer Vision and Pattern Recognition (CVPR) : 815–823

  67. Shao Z, Wang L, Wang Z, Du W, Wu W (2019) Saliency-aware convolution neural network for ship detection in surveillance video, IEEE Trans Circ Syst Vid Technol

  68. Shen L, Bai L (2006) A review on gabor wavelets for face recognition. Pattern Anal Applic 9:273–292

    Article  MathSciNet  Google Scholar 

  69. Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. Arxiv:1409–1556

  70. F.Solina, P. Peer, B. Batagelj, S. Juvan, J. Kovac, (2003) Colorbased face detection in the 15 seconds of fame art installation, International Conference on Computer Vision/Computer Graphics Collaboration for Model-based Imaging, Rendering, Image Analysis and Graphical special Effects : 38–47

  71. Štruc V, Gros J, Dobrisek S, Pavesic N (2013) Exploiting representation plurality for robust and efficient face recognition, Intenational Electrotechnical and Computer Science Conference (ERK): 121–124

  72. Taigman MLY, Yang M (2014) Deep learning face representation from predicting 10,000 classes, IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 1891–1898.

  73. Taigman Y, Yang M, Ranzato MA, Wolf L (2014) Deepface: closing the gap to human-level performance in face verification, IEEE Conference on Computer Vision and Pattern Recognition (CVPR): 1701–1708

  74. Vignesh S, Priya KM, Channappayya SS (2015) Face image quality assessment for face selection in surveil- lance video using convolutional neural networks, IEEE Global Conference on Signal and Information Processing (GlobalSIP) : 577–581

  75. Vishal A (2018) Deep face quality assessment, arXiv preprint arXiv:1811.04346

  76. Wang W, Yang J, Xiao J, Li S, Zhou D (2014) Face recognition based on deep learning, in: International Conference on Human Centered Computing, Springer: 812–820

  77. Wang W, Yang J, Xiao J, Li S, Zhou D (2014) Face recognition based on deep learning International Conference on Human Centered Computing, Springer, 2014, pp. 812–820

  78. Wang H, Hu J, Deng W (2018) Face feature extraction: a complete review. IEEE Access 6:6001–6039

    Article  Google Scholar 

  79. Wen Y, Zhang K, LiYu Z, Qiao Y (2016) A discriminative feature learning approach for deep face recognition. Eur Conf Comput Vision:499–515

  80. Wiskott NKL, Fellous J-M, Malsburg C (1997) Face recognition by elastic bunch graph matching. IEEE Trans Pattern Anal Mach Intell 19(7):775–779

    Article  Google Scholar 

  81. Wolf L, Hassner T, Maoz I (2011) Face recognition in unconstrained videos with matched background similarity, Conference on Computer Vision and Pattern Recognition : 529–534

  82. Wong SCY, Chen Sh., Lovell B (2011) Patch-based probabilistic image quality assessment for face selection and improved video-based face recognition, IEEE Biometrics Workshop, Computer Vision and Pattern Recognition (CVPR) Workshops : 81–88

  83. Wu Y, Ji Q (2019) Facial landmark detection: a literature survey. Int J Comput Vis 127(2):115–142

    Article  Google Scholar 

  84. Wu X, Xu K, Hall P (2017) A survey of image synthesis and editing with generative adversarial networks. Tsinghua Sci Technol 22(6):660–674

    Article  Google Scholar 

  85. Xie X, Lam KM (2006) Gabor-based kernel PCA with doubly nonlinear mapping for face recognition with a single face image. IEEE Trans Image Process 15(9):2481–2492

    Article  Google Scholar 

  86. Xu C, Liu Q, Ye M (2017) Age invariant face recognition and retrieval by coupled auto-encoder networks. Neurocomputing 222:62–71

    Article  Google Scholar 

  87. Yang J, Ren P, Zhang D, Chen D, Wen F, Li H, Hua G (2017) Neural aggregation network for video face recognition, Proceedings of the IEEE conference on computer vision and pattern recognition: 4362–4371

  88. Yanming G, Yu L, Ard O, Songyang L, Song W, Lew MS (2016) Deep learning for visual understanding: a review. Neurocomputing 187:27–48

    Article  Google Scholar 

  89. Yao L, Torabi A, Cho K, Ballas N, Pal C, Larochelle H, Courville A (2015) Describing videos by exploiting temporal structure, Proceedings of the IEEE international conference on computer vision: 4507–4515

  90. Zhang K, Zhang Z, Li Z, Qiao Y (2016) Joint face detection and alignment using multi-task cascaded convolutional networks. IEEE Signal Process Lett 23(10):1499–1503

    Article  Google Scholar 

  91. Zhao Z-Q, Zheng P, Xu S-t, Wu X (2019) Object detection with deep learning: a review, IEEE Trans Neur Netw Learn Syst (21)

  92. Zou J, Ji Q, Nagy G (2007) A comparative study of local matching approach for face recognition. IEEE Trans Image Process 16(10):2617–2628

    Article  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Rahma Abed.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Abed, R., Bahroun, S. & Zagrouba, E. KeyFrame extraction based on face quality measurement and convolutional neural network for efficient face recognition in videos. Multimed Tools Appl 80, 23157–23179 (2021). https://doi.org/10.1007/s11042-020-09385-5

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-020-09385-5

Keywords

Navigation