Abstract
Video classification plays a foundational role within the field of computer vision, that involves categorizing and labeling videos based on their content. Its significance is evident in a wide array of applications, encompassing video surveillance, content recommendation, action recognition, video indexing, and more. The goal of video classification is to automatically analyze and understand the visual information present in videos, enabling efficient organization, retrieval, and interpretation of large video collections. The fusion of convolutional neural networks (CNNs) and long short term memory (LSTM) networks has revolutionized the field of video classification by effectively capturing both spatial and temporal dependencies within video sequences. This fusion combines the strengths of CNNs in extracting spatial features and LSTMs in modeling sequential and temporal information. Two widely adopted architectures that incorporate this fusion are ConvLSTM and LRCN (Long-term Recurrent Convolutional Networks). This paper aims to explore the impact of convolutions on LSTM networks in the context of video classification and compare the performance of ConvLSTM and LRCN.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Papers with Code - ConvLSTM Explained. https://paperswithcode.com/method/convlstm. Consulté le 1 juillet 2023
Tsang, S.-H.: Brief review—LRCN: long-term recurrent convolutional networks for visual recognition and…. Medium, 18 Sept 2022. https://sh-tsang.medium.com/brief-review-lrcn-long-term-recurrent-convolutional-networks-for-visual-recognition-and-9542bc7e8a79. Consulté le 1 juillet 2023
Zebhi, S., AlModarresi, S.M.T., Abootalebi, V.: Action recognition in videos using global descriptors and pre-trained deep learning architecture. In: 2020 28th Iranian Conference on Electrical Engineering (ICEE), Tabriz, Iran, pp. 1–4. IEEE (2020). https://doi.org/10.1109/ICEE50131.2020.9261038
Cheng, Y., Yang, Y., Chen, H.-B., Wong, N., Yu, H.: S3-Net: a fast and lightweight video scene understanding network by single-shot segmentation. In: 2021 IEEE Winter Conference on Applications of Computer Vision (WACV), Waikoloa, HI, pp. 3328–3336. IEEE (2021). https://doi.org/10.1109/WACV48630.2021.00337
Benzyane, M., Zeroual, I., Azrour, M., Agoujil, S.: Convolutional long short-term memory network model for dynamic texture classification: a case study. In: Kacprzyk, J., Ezziyyani, M., Balas, V.E. (eds.) International Conference on Advanced Intelligent Systems for Sustainable Development. Lecture Notes in Networks and Systems, pp. 383–395. Springer Nature Switzerland, Cham (2023). https://doi.org/10.1007/978-3-031-26384-2_33
LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. Nature 521(7553), Art. No. 7553 (2015). https://doi.org/10.1038/nature14539
Brinker, T.J., et al.: Skin cancer classification using convolutional neural networks: systematic review. J. Med. Internet Res. 20(10), e11936 (2018). https://doi.org/10.2196/11936
Ratan, P.: What is the convolutional neural network architecture? Analytics Vidhya, 28 Oct 2020. https://www.analyticsvidhya.com/blog/2020/10/what-is-the-convolutional-neural-network-architecture/. Consulté le 14 mai 2023
LSTM Neural Network, Big Data Mining & Machine Learning, 28 avr 2018. www.big-data.tips. http://www.big-data.tips/lstm-neural-network. Consulté le 27 juin 2023
Ye, W., Cheng, J., Yang, F., Xu, Y.: Two-stream convolutional network for improving activity recognition using convolutional long short-term memory networks. IEEE Access 7, 67772–67780 (2019). https://doi.org/10.1109/ACCESS.2019.2918808
Sun, H., Yang, Y., Chen, Y., Liu, X., Wang, J.: Tourism demand forecasting of multi-attractions with spatiotemporal grid: a convolutional block attention module model. Inf. Technol. Tour. 1–29 (2023). https://doi.org/10.1007/s40558-023-00247-y
Ko, B.: Long-term Recurrent Convolutional Network (LRCN). Home, 16 Oct 2017. https://kobiso.github.io//research/research-lrcn/. Consulté le 1 juillet 2023
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Benzyane, M., Azrour, M., Zeroual, I., Agoujil, S. (2024). Exploring the Impact of Convolutions on LSTM Networks for Video Classification. In: Farhaoui, Y., Hussain, A., Saba, T., Taherdoost, H., Verma, A. (eds) Artificial Intelligence, Data Science and Applications. ICAISE 2023. Lecture Notes in Networks and Systems, vol 838. Springer, Cham. https://doi.org/10.1007/978-3-031-48573-2_4
Download citation
DOI: https://doi.org/10.1007/978-3-031-48573-2_4
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-48572-5
Online ISBN: 978-3-031-48573-2
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)