A Novel Video Prediction Algorithm Based on Robust Spatiotemporal Convolutional Long Short-Term Memory (Robust-ST-ConvLSTM)

Saideni, Wael; Helbert, David; Courreges, Fabien; Cances, Jean Pierre

doi:10.1007/978-981-19-1610-6_17

Wael Saideni¹³,
David Helbert¹³,
Fabien Courreges¹³ &
…
Jean Pierre Cances¹³

Part of the book series: Lecture Notes in Networks and Systems ((LNNS,volume 448))

512 Accesses

Abstract

Recently, video prediction algorithms based on neural networks have become a promising research direction. Therefore, a new recurrent video prediction algorithm called “Robust Spatiotemporal Convolutional Long Short-Term Memory” (Robust-ST-ConvLSTM) is proposed in this paper. Robust-ST-ConvLSTM proposes a new internal mechanism that is able to regulate efficiently the flow of spatiotemporal information from video signals based on higher-order Convolutional-LSTM. The spatiotemporal information is carried through the entire network to optimize and control the prediction potential of the ConvLSTM cell. In addition, in traditional ConvLSTM units, cell states, that carry relevant information throughout the processing of the input sequence, are updated using only one previous hidden state, which holds information on previous data unit already seen by the network. However, our Robust-ST-ConvLSTM unit will rely on N previous hidden states, that provide temporal context for the motion in video scenes, in the cell state updating process. Experimental results further suggest that the proposed architecture can improve the state-of-the-art video prediction methods significantly on two challenging datasets, including the standard Moving MNIST dataset, and the commonly used video prediction KTH dataset, as human motion dataset.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 259.00; Price excludes VAT (USA)

Softcover Book: USD 329.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Kitani KM, Ziebart BD, Bagnell JA, Hebert M (2012) Activity forecasting. In: ECCV
Google Scholar
Vondrick C, Pirsiavash H, Torralba A (2016) Anticipating visual representations from unlabeled video. In: CVPR
Google Scholar
Zeng K, Shen WB, Huang D, Sun M, Niebles JC (2017) Visual forecasting by imitating dynamics in natural sequences. In: ICCV
Google Scholar
Bhattacharyya A, Fritz M, Schiele B (2018) Long-term on-board prediction of people in traffic scenes under uncertainty. In: CVPR
Google Scholar
Hu A, Cotter F, Mohan N, Gurau C, Kendall A (2020) Probabilistic future prediction for video scene understanding. arXiv:2003.06409
LeCun Y, Bengio Y, Hinton GE (2015) Deep learning. Nature 521(7553)
Google Scholar
Xingjian S et al (2015) Convolutional LSTM network: a machine learning approach for precipitation nowcasting,. In: Proceedings of advanced neural information processing systems, pp 802–810
Google Scholar
Soltani R, Jiang H (2016) Higher order recurrent neural networks. arXiv:1605.00064
Lotter W, Kreiman G, Cox DD (2015) Unsupervised learning of visual structure using predictive generative networks. arXiv:1511.06380
Wichers N, Villegas R, Erhan D, Lee H (2018) Hierarchical long-term video prediction without supervision. In: ICML, Series proceedings of machine learning research, vol 80
Google Scholar
Villegas R, Yang J, Zou Y, Sohn S, Lin X, Lee H (2017) Learning to generate long-term future via hierarchical prediction. In: ICML
Google Scholar
Zhang J, Wang Y, Long M, Jianmin W, Yu PS (2019) Z-order recurrent neural networks for video prediction. In: ICME
Google Scholar
Ranzato M, Szlam A, Bruna J, Mathieu M, Collobert R, Chopra S (2014) Video (language) modeling: a baseline for generative models of natural videos. arXiv:1412.6604
Srivastava N, Mansimov E, Salakhutdinov R (2015) Unsupervised learning of video representations using LSTMs. In: ICML
Google Scholar
Lotter W, Kreiman G, Cox D (2017) Deep predictive coding networks for video prediction and unsupervised learning. In: ICLR (Poster)
Google Scholar
Byeon W, Wang Q, Srivastava RK, Koumoutsakos P (2018) Contextvp: fully context-aware video prediction. In: CVPR (Workshops)
Google Scholar
Patraucean V, Handa A, Cipolla R (2015) Spatio-temporal video autoencoder with differentiable memory. In: (ICLR) workshop
Google Scholar
Lu C, Hirsch M, Scholkopf B (2017) Flexible Spatio-temporal networks for video prediction. In: CVPR
Google Scholar
Denton EL, Birodkar V (2017) Unsupervised learning of disentangled representations from video. In: NeurIPS
Google Scholar
Oh J, Guo X, Lee H, Lewis RL, Singh SP (2015) Action-conditional video prediction using deep networks in Atari games. In: NeurIPS
Google Scholar
Denton E, Fergus R (2018) Stochastic video generation with a learned prior. In: Dy JG, Krause A (eds) ICML, series proceedings of machine learning research, vol 80
Google Scholar
Shahabeddin Nabavi S, Rochan M, Wang Y (2018) Future semantic segmentation with convolutional LSTM. In: BMVC
Google Scholar
Vora S, Mahjourian R, Pirk S, Angelova A (2018) Future segmentation using 3d structure. arXiv:1811.11358
Terwilliger A, Brazil G, Liu X (2019) Recurrent flow-guided semantic forecasting. In: WACV
Google Scholar
Shi X, Chen Z, Wang H, Yeung D, Wong W, Woo W (2015) Convolutional LSTM network: a machine learning approach for precipitation nowcasting. In: NeurIPS
Google Scholar
Wang Y, Long M, Wang J, Gao Z, Philip SY (2017) PredRNN: recurrent neural networks for predictive learning using spatiotemporal lstms. In: NeurIPS, pp 879–888
Google Scholar
Wang Y, Wu H, Zhang J, Gao Z, Wang J, Yu PS, Long M (2021) PredRNN: a recurrent neural network for spatiotemporal predictive learning. arXiv:2103.09504
Yan J, Qin G, Zhao R, Liang Y, Xu Q (2019) IEEE Access. Mixpred: video prediction beyond optical flow 7:185654–185665. https://doi.org/10.1109/ACCESS.2019.2961383
Article Google Scholar
Liu Z, Yeh RA, Tang X, Liu Y (2017) Video frame synthesis using deep voxel flow. In: Proceedings of IEEE international conference computer vision (CVPR), Oct 2017, pp 4463–4471
Google Scholar
Wang Y, Jiang L, Yang M-H, Li L-J, Long M, Fei-Fei L (2019) Eidetic 3d LSTM: a model for video prediction and beyond. In: ICLR
Google Scholar
Vondrick C, Pirsiavash H, Torralba A (2016) Generating videos with scene dynamics. In: NeurIPS
Google Scholar
Tulyakov S, Liu M-Y, Yang X, Kautz J (2018) MoCoGAN: de-composing motion and content for video generation. In: CVPR
Google Scholar
Aigner S, Körner M (2018) Futuregan: anticipating the future frames of video sequences using spatio-temporal 3d convolutions in progressively growing autoencoder GANs. arXiv:1810.01325
Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A, Bengio Y (2014) Generative adversarial nets. In: Proceedings of advances neural information processing systems conference 2014, pp 2672–2680
Google Scholar
Kwon Y-H, Park M-G (2019) Predicting future frames using retrospective cycle GAN. In: CVPR
Google Scholar
Oprea S, Martinez-Gonzalez P, Garcia-Garcia A, Castro-Vargas JA, Orts-Escolano S, Garcia-Rodriguez J, Argyros A (2020) A review on deep learning techniques for video prediction. arXiv:2004.05214
Tulyakov S, Liu M-Y, Yang X, Kautz J (2018) MoCoGAN: decomposing motion and content for video generation. In: CVPR
Google Scholar
Hochreiter S, Schmidhuber J (1997) Neural Comput. Long short-term memory 9(8):1735–1780
Google Scholar
Cances J, Meghdadi V (2000) Annales Des Telecommun. Joint channel estimation and data demodulation algorithms for fast time varying band limited frequency selective Rayleigh fading channels: a comparison study 55(56):226–237
Google Scholar
Schuldt C, Laptev I, Caputo B (2004) Recognizing human actions: a local SVM approach, vol 3. In: ICPR, pp 32–36
Google Scholar
Srivastava N, Mansimov E, Salakhutdinov R (2015) Unsupervised learning of video representations using LSTMs. In: ICML
Google Scholar
Horé A, Ziou D (2010) Image quality metrics: PSNR versus SSIM. In: 2010 20th International conference on pattern recognition, Istanbul, pp 2366–2369. https://doi.org/10.1109/ICPR.2010.579
Kingma D, Ba J (2015) Adam: a method for stochastic optimization. In: ICLR
Google Scholar

Download references

Acknowledgements

This work was supported in part by the sensors generation project of Nouvelle Aquitaine region (2018-1R50214).

Author information

Authors and Affiliations

XLIM Research Institute, UMR CNRS 7252, Limoges, France
Wael Saideni, David Helbert, Fabien Courreges & Jean Pierre Cances

Authors

Wael Saideni
View author publications
You can also search for this author in PubMed Google Scholar
David Helbert
View author publications
You can also search for this author in PubMed Google Scholar
Fabien Courreges
View author publications
You can also search for this author in PubMed Google Scholar
Jean Pierre Cances
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Wael Saideni .

Editor information

Editors and Affiliations

Middlesex University, London, UK
Xin-She Yang
The University of Reading, Reading, UK
Simon Sherratt
JIS University, Kolkata, India
Nilanjan Dey
Global Knowledge Research Foundation, Ahmedabad, India
Amit Joshi

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Saideni, W., Helbert, D., Courreges, F., Cances, J.P. (2023). A Novel Video Prediction Algorithm Based on Robust Spatiotemporal Convolutional Long Short-Term Memory (Robust-ST-ConvLSTM). In: Yang, XS., Sherratt, S., Dey, N., Joshi, A. (eds) Proceedings of Seventh International Congress on Information and Communication Technology. Lecture Notes in Networks and Systems, vol 448. Springer, Singapore. https://doi.org/10.1007/978-981-19-1610-6_17

Download citation

DOI: https://doi.org/10.1007/978-981-19-1610-6_17
Published: 27 July 2022
Publisher Name: Springer, Singapore
Print ISBN: 978-981-19-1609-0
Online ISBN: 978-981-19-1610-6
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics