Abstract
Recently, video prediction algorithms based on neural networks have become a promising research direction. Therefore, a new recurrent video prediction algorithm called “Robust Spatiotemporal Convolutional Long Short-Term Memory” (Robust-ST-ConvLSTM) is proposed in this paper. Robust-ST-ConvLSTM proposes a new internal mechanism that is able to regulate efficiently the flow of spatiotemporal information from video signals based on higher-order Convolutional-LSTM. The spatiotemporal information is carried through the entire network to optimize and control the prediction potential of the ConvLSTM cell. In addition, in traditional ConvLSTM units, cell states, that carry relevant information throughout the processing of the input sequence, are updated using only one previous hidden state, which holds information on previous data unit already seen by the network. However, our Robust-ST-ConvLSTM unit will rely on N previous hidden states, that provide temporal context for the motion in video scenes, in the cell state updating process. Experimental results further suggest that the proposed architecture can improve the state-of-the-art video prediction methods significantly on two challenging datasets, including the standard Moving MNIST dataset, and the commonly used video prediction KTH dataset, as human motion dataset.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Kitani KM, Ziebart BD, Bagnell JA, Hebert M (2012) Activity forecasting. In: ECCV
Vondrick C, Pirsiavash H, Torralba A (2016) Anticipating visual representations from unlabeled video. In: CVPR
Zeng K, Shen WB, Huang D, Sun M, Niebles JC (2017) Visual forecasting by imitating dynamics in natural sequences. In: ICCV
Bhattacharyya A, Fritz M, Schiele B (2018) Long-term on-board prediction of people in traffic scenes under uncertainty. In: CVPR
Hu A, Cotter F, Mohan N, Gurau C, Kendall A (2020) Probabilistic future prediction for video scene understanding. arXiv:2003.06409
LeCun Y, Bengio Y, Hinton GE (2015) Deep learning. Nature 521(7553)
Xingjian S et al (2015) Convolutional LSTM network: a machine learning approach for precipitation nowcasting,. In: Proceedings of advanced neural information processing systems, pp 802–810
Soltani R, Jiang H (2016) Higher order recurrent neural networks. arXiv:1605.00064
Lotter W, Kreiman G, Cox DD (2015) Unsupervised learning of visual structure using predictive generative networks. arXiv:1511.06380
Wichers N, Villegas R, Erhan D, Lee H (2018) Hierarchical long-term video prediction without supervision. In: ICML, Series proceedings of machine learning research, vol 80
Villegas R, Yang J, Zou Y, Sohn S, Lin X, Lee H (2017) Learning to generate long-term future via hierarchical prediction. In: ICML
Zhang J, Wang Y, Long M, Jianmin W, Yu PS (2019) Z-order recurrent neural networks for video prediction. In: ICME
Ranzato M, Szlam A, Bruna J, Mathieu M, Collobert R, Chopra S (2014) Video (language) modeling: a baseline for generative models of natural videos. arXiv:1412.6604
Srivastava N, Mansimov E, Salakhutdinov R (2015) Unsupervised learning of video representations using LSTMs. In: ICML
Lotter W, Kreiman G, Cox D (2017) Deep predictive coding networks for video prediction and unsupervised learning. In: ICLR (Poster)
Byeon W, Wang Q, Srivastava RK, Koumoutsakos P (2018) Contextvp: fully context-aware video prediction. In: CVPR (Workshops)
Patraucean V, Handa A, Cipolla R (2015) Spatio-temporal video autoencoder with differentiable memory. In: (ICLR) workshop
Lu C, Hirsch M, Scholkopf B (2017) Flexible Spatio-temporal networks for video prediction. In: CVPR
Denton EL, Birodkar V (2017) Unsupervised learning of disentangled representations from video. In: NeurIPS
Oh J, Guo X, Lee H, Lewis RL, Singh SP (2015) Action-conditional video prediction using deep networks in Atari games. In: NeurIPS
Denton E, Fergus R (2018) Stochastic video generation with a learned prior. In: Dy JG, Krause A (eds) ICML, series proceedings of machine learning research, vol 80
Shahabeddin Nabavi S, Rochan M, Wang Y (2018) Future semantic segmentation with convolutional LSTM. In: BMVC
Vora S, Mahjourian R, Pirk S, Angelova A (2018) Future segmentation using 3d structure. arXiv:1811.11358
Terwilliger A, Brazil G, Liu X (2019) Recurrent flow-guided semantic forecasting. In: WACV
Shi X, Chen Z, Wang H, Yeung D, Wong W, Woo W (2015) Convolutional LSTM network: a machine learning approach for precipitation nowcasting. In: NeurIPS
Wang Y, Long M, Wang J, Gao Z, Philip SY (2017) PredRNN: recurrent neural networks for predictive learning using spatiotemporal lstms. In: NeurIPS, pp 879–888
Wang Y, Wu H, Zhang J, Gao Z, Wang J, Yu PS, Long M (2021) PredRNN: a recurrent neural network for spatiotemporal predictive learning. arXiv:2103.09504
Yan J, Qin G, Zhao R, Liang Y, Xu Q (2019) IEEE Access. Mixpred: video prediction beyond optical flow 7:185654–185665. https://doi.org/10.1109/ACCESS.2019.2961383
Liu Z, Yeh RA, Tang X, Liu Y (2017) Video frame synthesis using deep voxel flow. In: Proceedings of IEEE international conference computer vision (CVPR), Oct 2017, pp 4463–4471
Wang Y, Jiang L, Yang M-H, Li L-J, Long M, Fei-Fei L (2019) Eidetic 3d LSTM: a model for video prediction and beyond. In: ICLR
Vondrick C, Pirsiavash H, Torralba A (2016) Generating videos with scene dynamics. In: NeurIPS
Tulyakov S, Liu M-Y, Yang X, Kautz J (2018) MoCoGAN: de-composing motion and content for video generation. In: CVPR
Aigner S, Körner M (2018) Futuregan: anticipating the future frames of video sequences using spatio-temporal 3d convolutions in progressively growing autoencoder GANs. arXiv:1810.01325
Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A, Bengio Y (2014) Generative adversarial nets. In: Proceedings of advances neural information processing systems conference 2014, pp 2672–2680
Kwon Y-H, Park M-G (2019) Predicting future frames using retrospective cycle GAN. In: CVPR
Oprea S, Martinez-Gonzalez P, Garcia-Garcia A, Castro-Vargas JA, Orts-Escolano S, Garcia-Rodriguez J, Argyros A (2020) A review on deep learning techniques for video prediction. arXiv:2004.05214
Tulyakov S, Liu M-Y, Yang X, Kautz J (2018) MoCoGAN: decomposing motion and content for video generation. In: CVPR
Hochreiter S, Schmidhuber J (1997) Neural Comput. Long short-term memory 9(8):1735–1780
Cances J, Meghdadi V (2000) Annales Des Telecommun. Joint channel estimation and data demodulation algorithms for fast time varying band limited frequency selective Rayleigh fading channels: a comparison study 55(56):226–237
Schuldt C, Laptev I, Caputo B (2004) Recognizing human actions: a local SVM approach, vol 3. In: ICPR, pp 32–36
Srivastava N, Mansimov E, Salakhutdinov R (2015) Unsupervised learning of video representations using LSTMs. In: ICML
Horé A, Ziou D (2010) Image quality metrics: PSNR versus SSIM. In: 2010 20th International conference on pattern recognition, Istanbul, pp 2366–2369. https://doi.org/10.1109/ICPR.2010.579
Kingma D, Ba J (2015) Adam: a method for stochastic optimization. In: ICLR
Acknowledgements
This work was supported in part by the sensors generation project of Nouvelle Aquitaine region (2018-1R50214).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Saideni, W., Helbert, D., Courreges, F., Cances, J.P. (2023). A Novel Video Prediction Algorithm Based on Robust Spatiotemporal Convolutional Long Short-Term Memory (Robust-ST-ConvLSTM). In: Yang, XS., Sherratt, S., Dey, N., Joshi, A. (eds) Proceedings of Seventh International Congress on Information and Communication Technology. Lecture Notes in Networks and Systems, vol 448. Springer, Singapore. https://doi.org/10.1007/978-981-19-1610-6_17
Download citation
DOI: https://doi.org/10.1007/978-981-19-1610-6_17
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-19-1609-0
Online ISBN: 978-981-19-1610-6
eBook Packages: EngineeringEngineering (R0)