A comparative analysis on major key-frame extraction techniques

Sunuwar, Jhuma; Borah, Samarjeet

doi:10.1007/s11042-024-18380-z

A comparative analysis on major key-frame extraction techniques

Published: 13 February 2024

(2024)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

141 Accesses
Explore all metrics

Abstract

Real-time hand gesture recognition involves analyzing static and dynamic gesture videos. Video is a sequential arrangement of images, captured and eventually displayed at a given frequency. Not all video frames are useful and including all frames makes video processing complex. Methods have been devised to remove redundant and identical frames for simplifying video processing. One such approach is key-frame extraction, which involves identifying and retaining only those frames that accurately represent the original content of the video. In this paper, we have empirically analyzed different methods for performing key-frame extraction. Experiment analysis of five key-frame extraction methods based on Simple Frame Extraction, Uniform Sampling, Structural Similarity Index, Absolute Two Frame Difference, Motion Detection, and Error correction based key-frame extraction technique using Visual Geometry Group-16 has been done. Three publicly available datasets DVS gesture, American Sign Language (ASL) gesture, IPN gesture, and two self-constructed NSL_Consonent and NSL_Vowel datasets have been used to evaluate the performance of key-frame extraction methods. NSL_Consonent and NSL_Vowel comprise 37 consonants and 17 vowels of the Nepali Sign Language. Analyzing the experimental results shows that uniform sampling is only suitable for static gestures that don't require any other structural information for selecting keyframes. Performance of Structural Similarity Index, KCKFE based on VGG16, and motion detection-based key-frame extraction is found suitable for dynamic gestures. The two-frame absolute difference method results in poor key-frame generation due to an equal number of frames being generated as present in the video.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Computer vision-based hand gesture recognition for human-robot interaction: a review

Article Open access 19 July 2023

HOTA: A Higher Order Metric for Evaluating Multi-object Tracking

Article Open access 08 October 2020

An Exploration into Human–Computer Interaction: Hand Gesture Recognition Management in a Challenging Environment

Article Open access 12 June 2023

Data availability

Data used during the current study are available in the link https://github.com/Jhums-2816/Key-frame-dataset.

References

Al-Najjar YA, Soong DC (2012) Comparison of image quality assessment: PSNR, HVS, SSIM. UIQI Int J Sci Eng Res 3(8):1–5
Google Scholar
Baraldi L, Paci F, Serra G, Benini L, Cucchiara R (2014) Gesture recognition in ego-centric videos using dense trajectories and hand segmentation. In: 2014 IEEE conference on computer vision and pattern recognition workshops. IEEE. https://doi.org/10.1109/cvprw.2014.107
Benitez-Garcia G, Olivares-Mercado J, Sanchez-Perez G, Yanai K (2021) IPN hand: a video dataset and benchmark for real-time continuous hand gesture recognition. In: 2020 25th International Conference on Pattern Recognition (ICPR). IEEE, pp 4340–4347. https://doi.org/10.1109/icpr48806.2021.9412317
Bharti S, Balmik A, Nandy A (2023) Novel error correction-based key-frame extraction technique for dynamic hand gesture recognition. Neural Comput Appl 35(28):21165–21180
Article Google Scholar
Carlsson S, Sullivan J (2001) Action recognition by shape matching to key-frames. Work Model Versus Exemplars Comput Vis 1:18
Google Scholar
Goel A, Goel AK, Kumar A (2023) The role of artificial neural network and machine learning in utilizing spatial information. Spat Inf Res 31(3):275–285
Article Google Scholar
Gygli M, Grabner H, Riemenschneider H, Van Gool L (2014) Creating summaries from user videos. In: Lecture notes in computer science. Springer International Publishing, pp 505–520. https://doi.org/10.1007/978-3-319-10584-0_33
Haq HBU, Asif M, Ahmad MB (2020) Video summarization techniques: a review. Int J Sci Technol Res 9(11):146–153
Google Scholar
Hoang NN, Lee GS, Kim SH, Yang HJ (2020) Effective Hand Gesture Recognition by Key-frame Selection and 3D Neural Network. Smart Media Journal 9(1):23–29
Google Scholar
Hu J, Liu R, Chen Z, Wang D, Zhang Y, Xie B (2023) Octave convolution-based vehicle detection using frame-difference as network input. Vis Comput 39(4):1503–1515
Google Scholar
Jadon, S., & Jasim, M. (2020). Video summarization using key-frame extraction and video skimming, URL: https://easychair.org/publications/preprint/Jx1h. [Accessed on: 08/2/2022].
Jadon S, Jasim M (2020) Unsupervised video summarization framework using keyframe extraction and video skimming. In: 2020 IEEE 5th International Conference on Computing Communication and Automation (ICCCA). IEEE. https://doi.org/10.1109/iccca49541.2020.9250764
Jain SD, Xiong B, Grauman K (2017) Fusionseg: learning to combine motion and appearance for fully automatic segmentation of generic objects in videos. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE. https://doi.org/10.1109/cvpr.2017.228
Kagalkar RM, Gumaste SV (2016) Gradient based key-frame extraction for continuous Indian sign language gesture recognition and sentence formation in Kannada language: a comparative study of classifiers. Int J Comput Sci Eng 4(9):1–11. Retrieved from https://www.ijcseonline.org/full_paper_view.php?paper_id=1047. Accessed 25 Apr 2022
Kopuklu O, Gunduz A, Kose N, Rigoll G (2019) Real-time hand gesture detection and classification using convolutional neural networks. In: 2019 14th IEEE International conference on automatic face & gesture recognition (FG 2019). IEEE. https://doi.org/10.1109/fg.2019.8756576
Kopuklu O, Kose N, Rigoll G (2018) Motion fused frames: data level fusion strategy for hand gesture recognition. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW). IEEE. https://doi.org/10.1109/cvprw.2018.00284
Kumar A (2023) Bit plane slicing chip using parallel processing in image processing. Natl Acad Sci Lett. https://doi.org/10.1007/s40009-023-01344-6
Kuznetsova A, Leal-Taixé L, Rosenhahn B (2013) Real-time sign language recognition using a consumer depth camera. In: Proceedings of the IEEE International conference on computer vision workshops, pp 83–90
Lian S, Hu W, Wang K (2014) Automatic user state recognition for hand gesture based low-cost television control system. IEEE Trans Consum Electron 60(1):107–115
Article Google Scholar
Liu H, Tang H, Xiao W, Guo Z, Tian L, Gao Y (2016) Sequential Bag-of-Words model for human action classification. CAAI Trans Intell Technol 1(2):125–136
Article Google Scholar
Liu T, Zhang HJ, Qi F (2003) A novel video key-frame-extraction algorithm based on perceived motion energy model. IEEE Trans Circuits Syst Video Technol 13(10):1006–1013
Article Google Scholar
Liu Y, Jiang D, Duan H, Sun Y, Li G, Tao B, Yun J, Liu Y, Chen B (2021) Dynamic gesture recognition algorithm based on 3d convolutional neural network. In: Ahmed SH (ed) Computational intelligence and neuroscience, 2021:1–12. https://doi.org/10.1155/2021/4828102
Lv C, Li J, Tian J (2021) Key-frame extraction for sports training based on improved deep learning. Sci Program 2021:1–8
Google Scholar
Mahmoodi J, Nezamabadi-pour H, Abbasi-Moghadam D (2022) Violence detection in videos using interest frame extraction and 3D convolutional neural network. Multimedia Tools Appl 81(15):20945–20961
Article Google Scholar
Mangla FU, Bashir A, Lali I, Bukhari AC, Shahzad B (2020) A novel key-frame selection-based sign language recognition framework for the video data. Imaging Sci J 68(3):156–169
Article Google Scholar
Meena P, Kumar H, Yadav SK (2023) A review on video summarization techniques. Eng Appl Artif Intell 118:105667
Article Google Scholar
Mentzelopoulos M, Psarrou A (2004) Key-frame extraction algorithm using entropy difference. In: Proceedings of the 6th ACM SIGMM International workshop on multimedia information retrieval. ACM. https://doi.org/10.1145/1026711.1026719
Nandini HM, Chethan HK, Rashmi BS (2022) Shot based keyframe extraction using edge-LBP approach. J King Saud Univ-Comput Inform Sci 34(7):4537–4545
Google Scholar
Narayana P, Beveridge JR, Draper BA (2019) Continuous gesture recognition through selective temporal fusion. In: 2019 International Joint Conference on Neural Networks (IJCNN). IEEE. https://doi.org/10.1109/ijcnn.2019.8852385
Pandey S, Dwivedy P, Meena S, Potnis A (2017) A survey on key-frame extraction methods of a MPEG video. In: 2017 International Conference on Computing, Communication and Automation (ICCCA). IEEE, pp 1192–1196. https://doi.org/10.1109/ccaa.2017.8229979
Pandian AA, Maheswari S (2023) A keyframe selection for summarization of informative activities using clustering in surveillance videos. Multimed Tools Appl 83:7021–7034. https://doi.org/10.1007/s11042-023-15859-z
Pathak B, Jalal AS, Agrawal SC, Bhatnagar C (2015) A framework for dynamic hand gesture recognition using key frames extraction. In: 2015 fifth National Conference on Computer Vision, Pattern Recognition, Image Processing and Graphics (NCVPRIPG). IEEE. https://doi.org/10.1109/ncvpripg.2015.7490038
Qiu J, Zhu J, Xu M, Dernoncourt F, Bui T, Wang Z, Bui T, Li B, Zhao D, Jin H (2022) MHMS: multimodal hierarchical multimedia summarization. arXiv preprint arXiv:2204.03734. Accessed 18 Feb 2022
Qiu-yu Z, Lu L, Mo-yi Z, Hong-xiang D, Jun-chi L (2015) A dynamic gesture trajectory recognition based on key-frame extraction and hmm. Int. J Signal Process Image Process Pattern Recognit.(IPPR) 8(6):91–106
Google Scholar
Rokade US, Doye D, Kokare M (2009) Hand gesture recognition using object based key-frame selection. In: 2009 International conference on digital image processing. IEEE. https://doi.org/10.1109/icdip.2009.74
Sandhu, S. K., & Agarwal, A. (2015). Summarizing Videos by Key-frame extraction using SSIM and other Visual Features. In Proceedings of the Sixth International Conference on Computer and Communication Technology 2015 (pp. 209–213).
Shen X, An J, Teng Z (2023) Key frame extraction method with global information balance. Multimed Tools Appl. https://doi.org/10.1007/s11042-023-16386-7
Simonyan K, Zisserman A (2014) Two-stream convolutional networks for action recognition in videos. arXiv preprint arXiv:1406.2199. Accessed 25 April 2022
Sun Y, Li P, Jiang Z, Hu S (2021) Feature fusion and clustering for key-frame extraction. Math Biosci Eng 18(6):9294–9311
Article PubMed Google Scholar
Tang H, Liu H, Xiao W, Sebe N (2019) Fast and robust dynamic hand gesture recognition via key-frames extraction and feature fusion. Neurocomputing 331:424–433
Article Google Scholar
Tang H, Wang W, Xu D, Yan Y, Sebe N (2018) GestureGAN for hand gesture-to-gesture translation in the wild. In: Proceedings of the 26th ACM International conference on Multimedia. ACM. https://doi.org/10.1145/3240508.3240704
Wang J, Zeng C, Wang Z, Jiang K (2022) An improved smart key frame extraction algorithm for vehicle target recognition. Comput Electr Eng 97:107540. https://doi.org/10.1016/j.compeleceng.2021.107540
Wang Z, Bovik AC (2002) A universal image quality index. IEEE Signal Process Lett 9(3):81–84
Article CAS ADS Google Scholar
Wang Z, Bovik AC, Sheikh HR (2017) Structural similarity based image quality assessment. In: Digital Video image quality and perceptual coding. CRC Press, pp 225–242
Wong SF, Cipolla R (2005) Real-time adaptive hand motion recognition using a sparse bayesian classifier. In: Lecture notes in computer science. Springer, Berlin, Heidelberg, pp 170–179. https://doi.org/10.1007/11573425_17
Yang H, Tian Q, Zhuang Q, Li L, Liang Q (2021) Fast and robust key-frame extraction method for gesture video based on high-level feature representation. SIViP 15(3):617–626
Article Google Scholar
Zhang Y, Cao C, Cheng J, Lu H (2018) Egogesture: a new dataset and benchmark for egocentric hand gesture recognition. IEEE Trans Multimedia 20(5):1038–1050
Article Google Scholar
Zhang Y, Li Y, Cai Z, Wang X, Zhang J, Lam S (2023) Key frame extraction method for lecture videos based on spatio-temporal subtitles. Multimed Tools Appl 83:5437–5450. https://doi.org/10.1007/s11042-023-15829-5
Zhang Y, Wang X, Qu B (2012) Three-frame difference algorithm research based on mathematical morphology. Procedia Engineering 29:2705–2709
Article Google Scholar
Zhao L, Qi W, Li SZ, Yang SQ, Zhang HJ (2000) Key-frame extraction and shot retrieval using nearest feature line (NFL). In: Proceedings of the 2000 ACM workshops on Multimedia. ACM. https://doi.org/10.1145/357744.357942
Zong Z, Gong Q (2017) Key-frame extraction based on dynamic color histogram and fast wavelet histogram. In: 2017 IEEE International Conference on Information and Automation (ICIA). IEEE. https://doi.org/10.1109/icinfa.2017.8078903

Download references

Author information

Authors and Affiliations

Department of Computer Applications, Sikkim Manipal Institute of Technology, Sikkim Manipal University, Sikkim, India
Jhuma Sunuwar & Samarjeet Borah

Authors

Jhuma Sunuwar
View author publications
You can also search for this author in PubMed Google Scholar
Samarjeet Borah
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Samarjeet Borah.

Ethics declarations

Conflict of interest

The authors declare that they have no conflicts of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Sunuwar, J., Borah, S. A comparative analysis on major key-frame extraction techniques. Multimed Tools Appl (2024). https://doi.org/10.1007/s11042-024-18380-z

Download citation

Received: 16 September 2022
Revised: 24 November 2023
Accepted: 19 January 2024
Published: 13 February 2024
DOI: https://doi.org/10.1007/s11042-024-18380-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A comparative analysis on major key-frame extraction techniques

Abstract

Access this article

Similar content being viewed by others

Computer vision-based hand gesture recognition for human-robot interaction: a review

HOTA: A Higher Order Metric for Evaluating Multi-object Tracking

An Exploration into Human–Computer Interaction: Hand Gesture Recognition Management in a Challenging Environment

Data availability

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A comparative analysis on major key-frame extraction techniques

Abstract

Access this article

Similar content being viewed by others

Computer vision-based hand gesture recognition for human-robot interaction: a review

HOTA: A Higher Order Metric for Evaluating Multi-object Tracking

An Exploration into Human–Computer Interaction: Hand Gesture Recognition Management in a Challenging Environment

Data availability

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation