Enhanced video temporal segmentation using a Siamese network with multimodal features

Mohamed, Bouyahi; Yassine, Ben Ayed

doi:10.1007/s11760-023-02662-4

Enhanced video temporal segmentation using a Siamese network with multimodal features

Original Paper
Published: 07 July 2023

Volume 17, pages 4295–4303, (2023)
Cite this article

Signal, Image and Video Processing Aims and scope Submit manuscript

Bouyahi Mohamed¹ &
Ben Ayed Yassine¹

151 Accesses
Explore all metrics

Abstract

Shot boundary detection (SBD) is a critical pre-processing task for intelligent video analysis applications. In this study, we proposed a novel multimodal approach for SBD by utilizing a Siamese network with multimodal features to learn the distance measure between audiovisual features. To extract relevant features from the audio stream’s power spectrum density (PSD), we combined Tsfresh, a Python package for time series feature extraction, with PCA (principal component analysis), and used a Gru-Attention network for learning sequential semantic representations and spatial location information. For the visual modality, we employed the pre-trained EfficientNet model to extract and learn visual features. Our proposed network learned the similarity score from the image embedding features and the PSD as audio features, which were then used to build a signal representing the audiovisual change. We used a global threshold for transition detection and an adaptive threshold to differentiate between the detected transition types (abrupt or gradual). In our experimental study, we applied the proposed approach to standard datasets (TRECvid 2001 and TRECvid 2007) and found that the introduction of audio features achieved a significant improvement in terms of F1 score (92.43%) and gradual transition detection (90.08%) compared to state-of-the-art models.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Shot boundary detection using multimodal Siamese network

Article 30 May 2023

Visual significance model based temporal signature for video shot boundary detection

Article 24 February 2023

Shot boundary detection in video using dual-stage optimized VGGNet based feature fusion and classification

Article 26 September 2023

References

Sharma, V., Gupta, M., Kumar, A., Mishra, D.: Video processing using deep learning techniques: a systematic literature review. IEEE Access 9, 139489–139507 (2021)
Article Google Scholar
Spolaor, N., Lee, H.D., Takaki, W.S.R., Ensina, L.A., Coy, C.S.R., Wu, F.C.: A systematic review on content-based video retrieval. Eng. Appl. Artif. Intell. 90, 103557 (2020)
Abdulhussain, S.H., Ramli, A.R., Saripan, M.I., Mahmmod, B.M., Al-Haddad, S.A.R., Jassim, W.A., et al.: Methods and challenges in shot boundary detection: a review. Entropy 20(4), 214 (2018)
Article Google Scholar
Georgiou, T., Liu, Y., Chen, W., Lew, M.: A survey of traditional and deep learning-based feature descriptors for high dimensional data in computer vision. Int. J. Multimed. Inf. Retr. 9(3), 135–170 (2020)
Article Google Scholar
Bouyahi, M., Ayed, Y.B.: Video scenes segmentation based on multimodal genre prediction. Procedia Comput. Sci. 176, 10–21 (2020)
Article Google Scholar
Bouyahi, M., Ayed, Y.B.: Multimodal features for shots boundary detection. In: International Conference on Machine Vision, vol. 11605, pp. 661–670 (2021)
Tan, M., Le, Q.: Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning, pp. 6105–6114 (2019)
Chakraborty, S., Thounaojam, D.M.: A novel shot boundary detection system using hybrid optimization technique. Appl. Intell. 49(9), 3207–3220 (2019)
Article Google Scholar
Sasithradevi, A., Roomi, S.M.M.: A new pyramidal opponent color-shape model based video shot boundary detection. J. Vis. Commun. Image Represent. 67, 102754 (2020)
Article Google Scholar
Chakraborty, S., Thounaojam, D.M.: Sbd-duo: a dual stage shot boundary detection technique robust to motion and illumination effect. Multimed. Tools Appl. 80(2), 3071–3087 (2021)
Article Google Scholar
Chakraborty, S., Thounaojam, D.M., Sinha, N.: A shot boundary detection technique based on visual colour information. Multimed. Tools Appl. 80(3), 4007–4022 (2021)
Article Google Scholar
Rastgoo, M.N., Nakisa, B., Maire, F., Rakotonirainy, A., Chandran, V.: Automatic driver stress level classification using multimodal deep learning. Expert Syst. Appl. 138, 112793 (2019)
Article Google Scholar
Chakladar, D.D., Kumar, P., Roy, P.P., Dogra, D.P., Scheme, E., Chang, V.: A multimodal-Siamese Neural Network (mSNN) for person verification using signatures and EEG. Inf. Fus. 71, 17–27 (2021)
Article Google Scholar
Sun, J., Peng, Y., Guo, Y., Li, D.: Segmentation of the multimodal brain tumor image used the multi-pathway architecture method based on 3d FCN. Neurocomputing 423, 34–45 (2021)
Article Google Scholar
Mocanu, B., Tapu, R., Zaharia, T.: A multimodal high level video segmentation for content targeted online advertising. In: International Symposium on Visual Computing, pp. 506–517 (2020)
Iwan, L.H., Thom, J.A.: Temporal video segmentation: detecting the end-of-act in circus performance videos. Multimed. Tools Appl. 76(1), 1379–1401 (2017)
Zhang, Z., Song, W., Li, Q.: Dual-aspect self-attention based on transformer for remaining useful life prediction. IEEE Trans. Instrum. Meas. 71, 1–11 (2022)
Shao, Y., Lin, J.C.-W., Srivastava, G., Jolfaei, A., Guo, D., Hu, Y.: Self-attention-based conditional random fields latent variables model for sequence labeling. Pattern Recognit. Lett. 145, 157–164 (2021)
Chavate, S., Mishra, R., Yadav, P.: A comparative analysis of video shot boundary detection using different approaches. In: 2021 10th International Conference on System Modeling & Advancement in Research Trends (SMART), pp. 1–7 (2021)
Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., Fei-Fei, L.: Imagenet: A large-scale hierarchical image database. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp. 248–255 (2009)
Tanberk, S., Dağlı, V., Gürkan, M.K.: Deep learning for videoconferencing: A brief examination of speech to text and speech synthesis. In: 6th International Conference on Computer Science and Engineering (UBMK), pp. 506–511 (2021)
Sajjad, M., Khan, Z.A., Ullah, A., Hussain, T., Ullah, W., Lee, M.Y., Baik, S.W.: A novel cnn-gru-based hybrid approach for short-term residential load forecasting. IEEE Access 8, 143759–143768 (2020)
Article Google Scholar
Wang, Y., Gui, R.: [PDF] mdpi.comA hybrid model for GRU ultra-short-term wind speed prediction based on tsfresh and sparse PCA. Energies 15, 7567 (2022)
Article Google Scholar
Shoeibi, A., Ghassemi, N., Alizadehsani, R., Rouhani, M., Hosseini-Nejad, H., Khosravi, A., Panahiazar, M., Nahavandi, S.: A comprehensive comparison of handcrafted features and convolutional autoencoders for epileptic seizures detection in EEG signals. Expert Syst. Appl. 163, 113788 (2021)
Article Google Scholar
Tippaya, S., Sitjongsataporn, S., Tan, T., Khan, M.M., Chamnongthai, K.: Multi-modal visual features-based video shot boundary detection. IEEE Access 5, 12563–12575 (2017)
Article Google Scholar
Rashmi, B., Nagendraswamy, H.: Video shot boundary detection using block based cumulative approach. Multimed. Tools Appl. 80(1), 641–664 (2021)
Article Google Scholar
Singh, A., Singh, T.D., Bandyopadhyay, S.: V2t: video to text framework using a novel automatic shot boundary detection algorithm. Multimed. Tools Appl. 81, 17989–18009 (2022)
Article Google Scholar
Thounaojam, D.M., Bhadouria, V.S., Roy, S., Singh, K., et al.: Shot boundary detection using perceptual and semantic information. Int. J Multimed. Inf. Retr. 6(2), 167–174 (2017)
Article Google Scholar

Download references

Author information

Authors and Affiliations

MIRACL Laboratory: Multimedia, InfoRmation systems and Advanced Computing, Sfax University, Sfax, Tunisia
Bouyahi Mohamed & Ben Ayed Yassine

Authors

Bouyahi Mohamed
View author publications
You can also search for this author in PubMed Google Scholar
Ben Ayed Yassine
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Bouyahi Mohamed.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Mohamed, B., Yassine, B.A. Enhanced video temporal segmentation using a Siamese network with multimodal features. SIViP 17, 4295–4303 (2023). https://doi.org/10.1007/s11760-023-02662-4

Download citation

Received: 06 March 2022
Revised: 23 May 2023
Accepted: 01 June 2023
Published: 07 July 2023
Issue Date: November 2023
DOI: https://doi.org/10.1007/s11760-023-02662-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Enhanced video temporal segmentation using a Siamese network with multimodal features

Abstract

Access this article

Similar content being viewed by others

Shot boundary detection using multimodal Siamese network

Visual significance model based temporal signature for video shot boundary detection

Shot boundary detection in video using dual-stage optimized VGGNet based feature fusion and classification

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Enhanced video temporal segmentation using a Siamese network with multimodal features

Abstract

Access this article

Similar content being viewed by others

Shot boundary detection using multimodal Siamese network

Visual significance model based temporal signature for video shot boundary detection

Shot boundary detection in video using dual-stage optimized VGGNet based feature fusion and classification

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation