Abstract
Real-time hand gesture recognition involves analyzing static and dynamic gesture videos. Video is a sequential arrangement of images, captured and eventually displayed at a given frequency. Not all video frames are useful and including all frames makes video processing complex. Methods have been devised to remove redundant and identical frames for simplifying video processing. One such approach is key-frame extraction, which involves identifying and retaining only those frames that accurately represent the original content of the video. In this paper, we have empirically analyzed different methods for performing key-frame extraction. Experiment analysis of five key-frame extraction methods based on Simple Frame Extraction, Uniform Sampling, Structural Similarity Index, Absolute Two Frame Difference, Motion Detection, and Error correction based key-frame extraction technique using Visual Geometry Group-16 has been done. Three publicly available datasets DVS gesture, American Sign Language (ASL) gesture, IPN gesture, and two self-constructed NSL_Consonent and NSL_Vowel datasets have been used to evaluate the performance of key-frame extraction methods. NSL_Consonent and NSL_Vowel comprise 37 consonants and 17 vowels of the Nepali Sign Language. Analyzing the experimental results shows that uniform sampling is only suitable for static gestures that don't require any other structural information for selecting keyframes. Performance of Structural Similarity Index, KCKFE based on VGG16, and motion detection-based key-frame extraction is found suitable for dynamic gestures. The two-frame absolute difference method results in poor key-frame generation due to an equal number of frames being generated as present in the video.
Similar content being viewed by others
Data availability
Data used during the current study are available in the link https://github.com/Jhums-2816/Key-frame-dataset.
References
Al-Najjar YA, Soong DC (2012) Comparison of image quality assessment: PSNR, HVS, SSIM. UIQI Int J Sci Eng Res 3(8):1–5
Baraldi L, Paci F, Serra G, Benini L, Cucchiara R (2014) Gesture recognition in ego-centric videos using dense trajectories and hand segmentation. In: 2014 IEEE conference on computer vision and pattern recognition workshops. IEEE. https://doi.org/10.1109/cvprw.2014.107
Benitez-Garcia G, Olivares-Mercado J, Sanchez-Perez G, Yanai K (2021) IPN hand: a video dataset and benchmark for real-time continuous hand gesture recognition. In: 2020 25th International Conference on Pattern Recognition (ICPR). IEEE, pp 4340–4347. https://doi.org/10.1109/icpr48806.2021.9412317
Bharti S, Balmik A, Nandy A (2023) Novel error correction-based key-frame extraction technique for dynamic hand gesture recognition. Neural Comput Appl 35(28):21165–21180
Carlsson S, Sullivan J (2001) Action recognition by shape matching to key-frames. Work Model Versus Exemplars Comput Vis 1:18
Goel A, Goel AK, Kumar A (2023) The role of artificial neural network and machine learning in utilizing spatial information. Spat Inf Res 31(3):275–285
Gygli M, Grabner H, Riemenschneider H, Van Gool L (2014) Creating summaries from user videos. In: Lecture notes in computer science. Springer International Publishing, pp 505–520. https://doi.org/10.1007/978-3-319-10584-0_33
Haq HBU, Asif M, Ahmad MB (2020) Video summarization techniques: a review. Int J Sci Technol Res 9(11):146–153
Hoang NN, Lee GS, Kim SH, Yang HJ (2020) Effective Hand Gesture Recognition by Key-frame Selection and 3D Neural Network. Smart Media Journal 9(1):23–29
Hu J, Liu R, Chen Z, Wang D, Zhang Y, Xie B (2023) Octave convolution-based vehicle detection using frame-difference as network input. Vis Comput 39(4):1503–1515
Jadon, S., & Jasim, M. (2020). Video summarization using key-frame extraction and video skimming, URL: https://easychair.org/publications/preprint/Jx1h. [Accessed on: 08/2/2022].
Jadon S, Jasim M (2020) Unsupervised video summarization framework using keyframe extraction and video skimming. In: 2020 IEEE 5th International Conference on Computing Communication and Automation (ICCCA). IEEE. https://doi.org/10.1109/iccca49541.2020.9250764
Jain SD, Xiong B, Grauman K (2017) Fusionseg: learning to combine motion and appearance for fully automatic segmentation of generic objects in videos. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE. https://doi.org/10.1109/cvpr.2017.228
Kagalkar RM, Gumaste SV (2016) Gradient based key-frame extraction for continuous Indian sign language gesture recognition and sentence formation in Kannada language: a comparative study of classifiers. Int J Comput Sci Eng 4(9):1–11. Retrieved from https://www.ijcseonline.org/full_paper_view.php?paper_id=1047. Accessed 25 Apr 2022
Kopuklu O, Gunduz A, Kose N, Rigoll G (2019) Real-time hand gesture detection and classification using convolutional neural networks. In: 2019 14th IEEE International conference on automatic face & gesture recognition (FG 2019). IEEE. https://doi.org/10.1109/fg.2019.8756576
Kopuklu O, Kose N, Rigoll G (2018) Motion fused frames: data level fusion strategy for hand gesture recognition. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW). IEEE. https://doi.org/10.1109/cvprw.2018.00284
Kumar A (2023) Bit plane slicing chip using parallel processing in image processing. Natl Acad Sci Lett. https://doi.org/10.1007/s40009-023-01344-6
Kuznetsova A, Leal-Taixé L, Rosenhahn B (2013) Real-time sign language recognition using a consumer depth camera. In: Proceedings of the IEEE International conference on computer vision workshops, pp 83–90
Lian S, Hu W, Wang K (2014) Automatic user state recognition for hand gesture based low-cost television control system. IEEE Trans Consum Electron 60(1):107–115
Liu H, Tang H, Xiao W, Guo Z, Tian L, Gao Y (2016) Sequential Bag-of-Words model for human action classification. CAAI Trans Intell Technol 1(2):125–136
Liu T, Zhang HJ, Qi F (2003) A novel video key-frame-extraction algorithm based on perceived motion energy model. IEEE Trans Circuits Syst Video Technol 13(10):1006–1013
Liu Y, Jiang D, Duan H, Sun Y, Li G, Tao B, Yun J, Liu Y, Chen B (2021) Dynamic gesture recognition algorithm based on 3d convolutional neural network. In: Ahmed SH (ed) Computational intelligence and neuroscience, 2021:1–12. https://doi.org/10.1155/2021/4828102
Lv C, Li J, Tian J (2021) Key-frame extraction for sports training based on improved deep learning. Sci Program 2021:1–8
Mahmoodi J, Nezamabadi-pour H, Abbasi-Moghadam D (2022) Violence detection in videos using interest frame extraction and 3D convolutional neural network. Multimedia Tools Appl 81(15):20945–20961
Mangla FU, Bashir A, Lali I, Bukhari AC, Shahzad B (2020) A novel key-frame selection-based sign language recognition framework for the video data. Imaging Sci J 68(3):156–169
Meena P, Kumar H, Yadav SK (2023) A review on video summarization techniques. Eng Appl Artif Intell 118:105667
Mentzelopoulos M, Psarrou A (2004) Key-frame extraction algorithm using entropy difference. In: Proceedings of the 6th ACM SIGMM International workshop on multimedia information retrieval. ACM. https://doi.org/10.1145/1026711.1026719
Nandini HM, Chethan HK, Rashmi BS (2022) Shot based keyframe extraction using edge-LBP approach. J King Saud Univ-Comput Inform Sci 34(7):4537–4545
Narayana P, Beveridge JR, Draper BA (2019) Continuous gesture recognition through selective temporal fusion. In: 2019 International Joint Conference on Neural Networks (IJCNN). IEEE. https://doi.org/10.1109/ijcnn.2019.8852385
Pandey S, Dwivedy P, Meena S, Potnis A (2017) A survey on key-frame extraction methods of a MPEG video. In: 2017 International Conference on Computing, Communication and Automation (ICCCA). IEEE, pp 1192–1196. https://doi.org/10.1109/ccaa.2017.8229979
Pandian AA, Maheswari S (2023) A keyframe selection for summarization of informative activities using clustering in surveillance videos. Multimed Tools Appl 83:7021–7034. https://doi.org/10.1007/s11042-023-15859-z
Pathak B, Jalal AS, Agrawal SC, Bhatnagar C (2015) A framework for dynamic hand gesture recognition using key frames extraction. In: 2015 fifth National Conference on Computer Vision, Pattern Recognition, Image Processing and Graphics (NCVPRIPG). IEEE. https://doi.org/10.1109/ncvpripg.2015.7490038
Qiu J, Zhu J, Xu M, Dernoncourt F, Bui T, Wang Z, Bui T, Li B, Zhao D, Jin H (2022) MHMS: multimodal hierarchical multimedia summarization. arXiv preprint arXiv:2204.03734. Accessed 18 Feb 2022
Qiu-yu Z, Lu L, Mo-yi Z, Hong-xiang D, Jun-chi L (2015) A dynamic gesture trajectory recognition based on key-frame extraction and hmm. Int. J Signal Process Image Process Pattern Recognit.(IPPR) 8(6):91–106
Rokade US, Doye D, Kokare M (2009) Hand gesture recognition using object based key-frame selection. In: 2009 International conference on digital image processing. IEEE. https://doi.org/10.1109/icdip.2009.74
Sandhu, S. K., & Agarwal, A. (2015). Summarizing Videos by Key-frame extraction using SSIM and other Visual Features. In Proceedings of the Sixth International Conference on Computer and Communication Technology 2015 (pp. 209–213).
Shen X, An J, Teng Z (2023) Key frame extraction method with global information balance. Multimed Tools Appl. https://doi.org/10.1007/s11042-023-16386-7
Simonyan K, Zisserman A (2014) Two-stream convolutional networks for action recognition in videos. arXiv preprint arXiv:1406.2199. Accessed 25 April 2022
Sun Y, Li P, Jiang Z, Hu S (2021) Feature fusion and clustering for key-frame extraction. Math Biosci Eng 18(6):9294–9311
Tang H, Liu H, Xiao W, Sebe N (2019) Fast and robust dynamic hand gesture recognition via key-frames extraction and feature fusion. Neurocomputing 331:424–433
Tang H, Wang W, Xu D, Yan Y, Sebe N (2018) GestureGAN for hand gesture-to-gesture translation in the wild. In: Proceedings of the 26th ACM International conference on Multimedia. ACM. https://doi.org/10.1145/3240508.3240704
Wang J, Zeng C, Wang Z, Jiang K (2022) An improved smart key frame extraction algorithm for vehicle target recognition. Comput Electr Eng 97:107540. https://doi.org/10.1016/j.compeleceng.2021.107540
Wang Z, Bovik AC (2002) A universal image quality index. IEEE Signal Process Lett 9(3):81–84
Wang Z, Bovik AC, Sheikh HR (2017) Structural similarity based image quality assessment. In: Digital Video image quality and perceptual coding. CRC Press, pp 225–242
Wong SF, Cipolla R (2005) Real-time adaptive hand motion recognition using a sparse bayesian classifier. In: Lecture notes in computer science. Springer, Berlin, Heidelberg, pp 170–179. https://doi.org/10.1007/11573425_17
Yang H, Tian Q, Zhuang Q, Li L, Liang Q (2021) Fast and robust key-frame extraction method for gesture video based on high-level feature representation. SIViP 15(3):617–626
Zhang Y, Cao C, Cheng J, Lu H (2018) Egogesture: a new dataset and benchmark for egocentric hand gesture recognition. IEEE Trans Multimedia 20(5):1038–1050
Zhang Y, Li Y, Cai Z, Wang X, Zhang J, Lam S (2023) Key frame extraction method for lecture videos based on spatio-temporal subtitles. Multimed Tools Appl 83:5437–5450. https://doi.org/10.1007/s11042-023-15829-5
Zhang Y, Wang X, Qu B (2012) Three-frame difference algorithm research based on mathematical morphology. Procedia Engineering 29:2705–2709
Zhao L, Qi W, Li SZ, Yang SQ, Zhang HJ (2000) Key-frame extraction and shot retrieval using nearest feature line (NFL). In: Proceedings of the 2000 ACM workshops on Multimedia. ACM. https://doi.org/10.1145/357744.357942
Zong Z, Gong Q (2017) Key-frame extraction based on dynamic color histogram and fast wavelet histogram. In: 2017 IEEE International Conference on Information and Automation (ICIA). IEEE. https://doi.org/10.1109/icinfa.2017.8078903
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflicts of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Sunuwar, J., Borah, S. A comparative analysis on major key-frame extraction techniques. Multimed Tools Appl (2024). https://doi.org/10.1007/s11042-024-18380-z
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/s11042-024-18380-z