Skip to main content
Log in

A comparative analysis on major key-frame extraction techniques

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Real-time hand gesture recognition involves analyzing static and dynamic gesture videos. Video is a sequential arrangement of images, captured and eventually displayed at a given frequency. Not all video frames are useful and including all frames makes video processing complex. Methods have been devised to remove redundant and identical frames for simplifying video processing. One such approach is key-frame extraction, which involves identifying and retaining only those frames that accurately represent the original content of the video. In this paper, we have empirically analyzed different methods for performing key-frame extraction. Experiment analysis of five key-frame extraction methods based on Simple Frame Extraction, Uniform Sampling, Structural Similarity Index, Absolute Two Frame Difference, Motion Detection, and Error correction based key-frame extraction technique using Visual Geometry Group-16 has been done. Three publicly available datasets DVS gesture, American Sign Language (ASL) gesture, IPN gesture, and two self-constructed NSL_Consonent and NSL_Vowel datasets have been used to evaluate the performance of key-frame extraction methods. NSL_Consonent and NSL_Vowel comprise 37 consonants and 17 vowels of the Nepali Sign Language. Analyzing the experimental results shows that uniform sampling is only suitable for static gestures that don't require any other structural information for selecting keyframes. Performance of Structural Similarity Index, KCKFE based on VGG16, and motion detection-based key-frame extraction is found suitable for dynamic gestures. The two-frame absolute difference method results in poor key-frame generation due to an equal number of frames being generated as present in the video.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Algorithm 1
Algorithm 2
Algorithm 3
Algorithm 4
Algorithm 5
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

Data availability

Data used during the current study are available in the link https://github.com/Jhums-2816/Key-frame-dataset.

References

  1. Al-Najjar YA, Soong DC (2012) Comparison of image quality assessment: PSNR, HVS, SSIM. UIQI Int J Sci Eng Res 3(8):1–5

    Google Scholar 

  2. Baraldi L, Paci F, Serra G, Benini L, Cucchiara R (2014) Gesture recognition in ego-centric videos using dense trajectories and hand segmentation. In: 2014 IEEE conference on computer vision and pattern recognition workshops. IEEE. https://doi.org/10.1109/cvprw.2014.107

  3. Benitez-Garcia G, Olivares-Mercado J, Sanchez-Perez G, Yanai K (2021) IPN hand: a video dataset and benchmark for real-time continuous hand gesture recognition. In: 2020 25th International Conference on Pattern Recognition (ICPR). IEEE, pp 4340–4347. https://doi.org/10.1109/icpr48806.2021.9412317

  4. Bharti S, Balmik A, Nandy A (2023) Novel error correction-based key-frame extraction technique for dynamic hand gesture recognition. Neural Comput Appl 35(28):21165–21180

    Article  Google Scholar 

  5. Carlsson S, Sullivan J (2001) Action recognition by shape matching to key-frames. Work Model Versus Exemplars Comput Vis 1:18

    Google Scholar 

  6. Goel A, Goel AK, Kumar A (2023) The role of artificial neural network and machine learning in utilizing spatial information. Spat Inf Res 31(3):275–285

    Article  Google Scholar 

  7. Gygli M, Grabner H, Riemenschneider H, Van Gool L (2014) Creating summaries from user videos. In: Lecture notes in computer science. Springer International Publishing, pp 505–520. https://doi.org/10.1007/978-3-319-10584-0_33

  8. Haq HBU, Asif M, Ahmad MB (2020) Video summarization techniques: a review. Int J Sci Technol Res 9(11):146–153

    Google Scholar 

  9. Hoang NN, Lee GS, Kim SH, Yang HJ (2020) Effective Hand Gesture Recognition by Key-frame Selection and 3D Neural Network. Smart Media Journal 9(1):23–29

    Google Scholar 

  10. Hu J, Liu R, Chen Z, Wang D, Zhang Y, Xie B (2023) Octave convolution-based vehicle detection using frame-difference as network input. Vis Comput 39(4):1503–1515

    Google Scholar 

  11. Jadon, S., & Jasim, M. (2020). Video summarization using key-frame extraction and video skimming, URL: https://easychair.org/publications/preprint/Jx1h. [Accessed on: 08/2/2022].

  12. Jadon S, Jasim M (2020) Unsupervised video summarization framework using keyframe extraction and video skimming. In: 2020 IEEE 5th International Conference on Computing Communication and Automation (ICCCA). IEEE. https://doi.org/10.1109/iccca49541.2020.9250764

  13. Jain SD, Xiong B, Grauman K (2017) Fusionseg: learning to combine motion and appearance for fully automatic segmentation of generic objects in videos. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE. https://doi.org/10.1109/cvpr.2017.228

  14. Kagalkar RM, Gumaste SV (2016) Gradient based key-frame extraction for continuous Indian sign language gesture recognition and sentence formation in Kannada language: a comparative study of classifiers. Int J Comput Sci Eng 4(9):1–11. Retrieved from https://www.ijcseonline.org/full_paper_view.php?paper_id=1047. Accessed 25 Apr 2022

  15. Kopuklu O, Gunduz A, Kose N, Rigoll G (2019) Real-time hand gesture detection and classification using convolutional neural networks. In: 2019 14th IEEE International conference on automatic face & gesture recognition (FG 2019). IEEE. https://doi.org/10.1109/fg.2019.8756576

  16. Kopuklu O, Kose N, Rigoll G (2018) Motion fused frames: data level fusion strategy for hand gesture recognition. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW). IEEE. https://doi.org/10.1109/cvprw.2018.00284

  17. Kumar A (2023) Bit plane slicing chip using parallel processing in image processing. Natl Acad Sci Lett. https://doi.org/10.1007/s40009-023-01344-6

  18. Kuznetsova A, Leal-Taixé L, Rosenhahn B (2013) Real-time sign language recognition using a consumer depth camera. In: Proceedings of the IEEE International conference on computer vision workshops, pp 83–90

  19. Lian S, Hu W, Wang K (2014) Automatic user state recognition for hand gesture based low-cost television control system. IEEE Trans Consum Electron 60(1):107–115

    Article  Google Scholar 

  20. Liu H, Tang H, Xiao W, Guo Z, Tian L, Gao Y (2016) Sequential Bag-of-Words model for human action classification. CAAI Trans Intell Technol 1(2):125–136

    Article  Google Scholar 

  21. Liu T, Zhang HJ, Qi F (2003) A novel video key-frame-extraction algorithm based on perceived motion energy model. IEEE Trans Circuits Syst Video Technol 13(10):1006–1013

    Article  Google Scholar 

  22. Liu Y, Jiang D, Duan H, Sun Y, Li G, Tao B, Yun J, Liu Y, Chen B (2021) Dynamic gesture recognition algorithm based on 3d convolutional neural network. In: Ahmed SH (ed) Computational intelligence and neuroscience, 2021:1–12. https://doi.org/10.1155/2021/4828102

  23. Lv C, Li J, Tian J (2021) Key-frame extraction for sports training based on improved deep learning. Sci Program 2021:1–8

    Google Scholar 

  24. Mahmoodi J, Nezamabadi-pour H, Abbasi-Moghadam D (2022) Violence detection in videos using interest frame extraction and 3D convolutional neural network. Multimedia Tools Appl 81(15):20945–20961

    Article  Google Scholar 

  25. Mangla FU, Bashir A, Lali I, Bukhari AC, Shahzad B (2020) A novel key-frame selection-based sign language recognition framework for the video data. Imaging Sci J 68(3):156–169

    Article  Google Scholar 

  26. Meena P, Kumar H, Yadav SK (2023) A review on video summarization techniques. Eng Appl Artif Intell 118:105667

    Article  Google Scholar 

  27. Mentzelopoulos M, Psarrou A (2004) Key-frame extraction algorithm using entropy difference. In: Proceedings of the 6th ACM SIGMM International workshop on multimedia information retrieval. ACM. https://doi.org/10.1145/1026711.1026719

  28. Nandini HM, Chethan HK, Rashmi BS (2022) Shot based keyframe extraction using edge-LBP approach. J King Saud Univ-Comput Inform Sci 34(7):4537–4545

    Google Scholar 

  29. Narayana P, Beveridge JR, Draper BA (2019) Continuous gesture recognition through selective temporal fusion. In: 2019 International Joint Conference on Neural Networks (IJCNN). IEEE. https://doi.org/10.1109/ijcnn.2019.8852385

  30. Pandey S, Dwivedy P, Meena S, Potnis A (2017) A survey on key-frame extraction methods of a MPEG video. In: 2017 International Conference on Computing, Communication and Automation (ICCCA). IEEE, pp 1192–1196. https://doi.org/10.1109/ccaa.2017.8229979

  31. Pandian AA, Maheswari S (2023) A keyframe selection for summarization of informative activities using clustering in surveillance videos. Multimed Tools Appl 83:7021–7034. https://doi.org/10.1007/s11042-023-15859-z

  32. Pathak B, Jalal AS, Agrawal SC, Bhatnagar C (2015) A framework for dynamic hand gesture recognition using key frames extraction. In: 2015 fifth National Conference on Computer Vision, Pattern Recognition, Image Processing and Graphics (NCVPRIPG). IEEE. https://doi.org/10.1109/ncvpripg.2015.7490038

  33. Qiu J, Zhu J, Xu M, Dernoncourt F, Bui T, Wang Z, Bui T, Li B, Zhao D, Jin H (2022) MHMS: multimodal hierarchical multimedia summarization. arXiv preprint arXiv:2204.03734. Accessed 18 Feb 2022

  34. Qiu-yu Z, Lu L, Mo-yi Z, Hong-xiang D, Jun-chi L (2015) A dynamic gesture trajectory recognition based on key-frame extraction and hmm. Int. J Signal Process Image Process Pattern Recognit.(IPPR) 8(6):91–106

    Google Scholar 

  35. Rokade US, Doye D, Kokare M (2009) Hand gesture recognition using object based key-frame selection. In: 2009 International conference on digital image processing. IEEE. https://doi.org/10.1109/icdip.2009.74

  36. Sandhu, S. K., & Agarwal, A. (2015). Summarizing Videos by Key-frame extraction using SSIM and other Visual Features. In Proceedings of the Sixth International Conference on Computer and Communication Technology 2015 (pp. 209–213).

  37. Shen X, An J, Teng Z (2023) Key frame extraction method with global information balance. Multimed Tools Appl. https://doi.org/10.1007/s11042-023-16386-7

  38. Simonyan K,  Zisserman A (2014) Two-stream convolutional networks for action recognition in videos. arXiv preprint arXiv:1406.2199. Accessed 25 April 2022

  39. Sun Y, Li P, Jiang Z, Hu S (2021) Feature fusion and clustering for key-frame extraction. Math Biosci Eng 18(6):9294–9311

    Article  PubMed  Google Scholar 

  40. Tang H, Liu H, Xiao W, Sebe N (2019) Fast and robust dynamic hand gesture recognition via key-frames extraction and feature fusion. Neurocomputing 331:424–433

    Article  Google Scholar 

  41. Tang H, Wang W, Xu D, Yan Y, Sebe N (2018) GestureGAN for hand gesture-to-gesture translation in the wild. In: Proceedings of the 26th ACM International conference on Multimedia. ACM. https://doi.org/10.1145/3240508.3240704

  42. Wang J, Zeng C, Wang Z, Jiang K (2022) An improved smart key frame extraction algorithm for vehicle target recognition. Comput Electr Eng 97:107540. https://doi.org/10.1016/j.compeleceng.2021.107540

  43. Wang Z, Bovik AC (2002) A universal image quality index. IEEE Signal Process Lett 9(3):81–84

    Article  CAS  ADS  Google Scholar 

  44. Wang Z, Bovik AC, Sheikh HR (2017) Structural similarity based image quality assessment. In: Digital Video image quality and perceptual coding. CRC Press, pp 225–242

  45. Wong SF, Cipolla R (2005) Real-time adaptive hand motion recognition using a sparse bayesian classifier. In: Lecture notes in computer science. Springer, Berlin, Heidelberg, pp 170–179. https://doi.org/10.1007/11573425_17

  46. Yang H, Tian Q, Zhuang Q, Li L, Liang Q (2021) Fast and robust key-frame extraction method for gesture video based on high-level feature representation. SIViP 15(3):617–626

    Article  Google Scholar 

  47. Zhang Y, Cao C, Cheng J, Lu H (2018) Egogesture: a new dataset and benchmark for egocentric hand gesture recognition. IEEE Trans Multimedia 20(5):1038–1050

    Article  Google Scholar 

  48. Zhang Y, Li Y, Cai Z, Wang X, Zhang J, Lam S (2023) Key frame extraction method for lecture videos based on spatio-temporal subtitles. Multimed Tools Appl 83:5437–5450. https://doi.org/10.1007/s11042-023-15829-5

  49. Zhang Y, Wang X, Qu B (2012) Three-frame difference algorithm research based on mathematical morphology. Procedia Engineering 29:2705–2709

    Article  Google Scholar 

  50. Zhao L, Qi W, Li SZ, Yang SQ, Zhang HJ (2000) Key-frame extraction and shot retrieval using nearest feature line (NFL). In: Proceedings of the 2000 ACM workshops on Multimedia. ACM. https://doi.org/10.1145/357744.357942

  51. Zong Z, Gong Q (2017) Key-frame extraction based on dynamic color histogram and fast wavelet histogram. In: 2017 IEEE International Conference on Information and Automation (ICIA). IEEE. https://doi.org/10.1109/icinfa.2017.8078903

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Samarjeet Borah.

Ethics declarations

Conflict of interest

The authors declare that they have no conflicts of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Sunuwar, J., Borah, S. A comparative analysis on major key-frame extraction techniques. Multimed Tools Appl (2024). https://doi.org/10.1007/s11042-024-18380-z

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s11042-024-18380-z

Keywords

Navigation