Skip to main content

A Novel Video Prediction Algorithm Based on Robust Spatiotemporal Convolutional Long Short-Term Memory (Robust-ST-ConvLSTM)

  • Conference paper
  • First Online:
Proceedings of Seventh International Congress on Information and Communication Technology

Part of the book series: Lecture Notes in Networks and Systems ((LNNS,volume 448))

  • 512 Accesses

Abstract

Recently, video prediction algorithms based on neural networks have become a promising research direction. Therefore, a new recurrent video prediction algorithm called “Robust Spatiotemporal Convolutional Long Short-Term Memory” (Robust-ST-ConvLSTM) is proposed in this paper. Robust-ST-ConvLSTM proposes a new internal mechanism that is able to regulate efficiently the flow of spatiotemporal information from video signals based on higher-order Convolutional-LSTM. The spatiotemporal information is carried through the entire network to optimize and control the prediction potential of the ConvLSTM cell. In addition, in traditional ConvLSTM units, cell states, that carry relevant information throughout the processing of the input sequence, are updated using only one previous hidden state, which holds information on previous data unit already seen by the network. However, our Robust-ST-ConvLSTM unit will rely on N previous hidden states, that provide temporal context for the motion in video scenes, in the cell state updating process. Experimental results further suggest that the proposed architecture can improve the state-of-the-art video prediction methods significantly on two challenging datasets, including the standard Moving MNIST dataset, and the commonly used video prediction KTH dataset, as human motion dataset.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 259.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 329.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Kitani KM, Ziebart BD, Bagnell JA, Hebert M (2012) Activity forecasting. In: ECCV

    Google Scholar 

  2. Vondrick C, Pirsiavash H, Torralba A (2016) Anticipating visual representations from unlabeled video. In: CVPR

    Google Scholar 

  3. Zeng K, Shen WB, Huang D, Sun M, Niebles JC (2017) Visual forecasting by imitating dynamics in natural sequences. In: ICCV

    Google Scholar 

  4. Bhattacharyya A, Fritz M, Schiele B (2018) Long-term on-board prediction of people in traffic scenes under uncertainty. In: CVPR

    Google Scholar 

  5. Hu A, Cotter F, Mohan N, Gurau C, Kendall A (2020) Probabilistic future prediction for video scene understanding. arXiv:2003.06409

  6. LeCun Y, Bengio Y, Hinton GE (2015) Deep learning. Nature 521(7553)

    Google Scholar 

  7. Xingjian S et al (2015) Convolutional LSTM network: a machine learning approach for precipitation nowcasting,. In: Proceedings of advanced neural information processing systems, pp 802–810

    Google Scholar 

  8. Soltani R, Jiang H (2016) Higher order recurrent neural networks. arXiv:1605.00064

  9. Lotter W, Kreiman G, Cox DD (2015) Unsupervised learning of visual structure using predictive generative networks. arXiv:1511.06380

  10. Wichers N, Villegas R, Erhan D, Lee H (2018) Hierarchical long-term video prediction without supervision. In: ICML, Series proceedings of machine learning research, vol 80

    Google Scholar 

  11. Villegas R, Yang J, Zou Y, Sohn S, Lin X, Lee H (2017) Learning to generate long-term future via hierarchical prediction. In: ICML

    Google Scholar 

  12. Zhang J, Wang Y, Long M, Jianmin W, Yu PS (2019) Z-order recurrent neural networks for video prediction. In: ICME

    Google Scholar 

  13. Ranzato M, Szlam A, Bruna J, Mathieu M, Collobert R, Chopra S (2014) Video (language) modeling: a baseline for generative models of natural videos. arXiv:1412.6604

  14. Srivastava N, Mansimov E, Salakhutdinov R (2015) Unsupervised learning of video representations using LSTMs. In: ICML

    Google Scholar 

  15. Lotter W, Kreiman G, Cox D (2017) Deep predictive coding networks for video prediction and unsupervised learning. In: ICLR (Poster)

    Google Scholar 

  16. Byeon W, Wang Q, Srivastava RK, Koumoutsakos P (2018) Contextvp: fully context-aware video prediction. In: CVPR (Workshops)

    Google Scholar 

  17. Patraucean V, Handa A, Cipolla R (2015) Spatio-temporal video autoencoder with differentiable memory. In: (ICLR) workshop

    Google Scholar 

  18. Lu C, Hirsch M, Scholkopf B (2017) Flexible Spatio-temporal networks for video prediction. In: CVPR

    Google Scholar 

  19. Denton EL, Birodkar V (2017) Unsupervised learning of disentangled representations from video. In: NeurIPS

    Google Scholar 

  20. Oh J, Guo X, Lee H, Lewis RL, Singh SP (2015) Action-conditional video prediction using deep networks in Atari games. In: NeurIPS

    Google Scholar 

  21. Denton E, Fergus R (2018) Stochastic video generation with a learned prior. In: Dy JG, Krause A (eds) ICML, series proceedings of machine learning research, vol 80

    Google Scholar 

  22. Shahabeddin Nabavi S, Rochan M, Wang Y (2018) Future semantic segmentation with convolutional LSTM. In: BMVC

    Google Scholar 

  23. Vora S, Mahjourian R, Pirk S, Angelova A (2018) Future segmentation using 3d structure. arXiv:1811.11358

  24. Terwilliger A, Brazil G, Liu X (2019) Recurrent flow-guided semantic forecasting. In: WACV

    Google Scholar 

  25. Shi X, Chen Z, Wang H, Yeung D, Wong W, Woo W (2015) Convolutional LSTM network: a machine learning approach for precipitation nowcasting. In: NeurIPS

    Google Scholar 

  26. Wang Y, Long M, Wang J, Gao Z, Philip SY (2017) PredRNN: recurrent neural networks for predictive learning using spatiotemporal lstms. In: NeurIPS, pp 879–888

    Google Scholar 

  27. Wang Y, Wu H, Zhang J, Gao Z, Wang J, Yu PS, Long M (2021) PredRNN: a recurrent neural network for spatiotemporal predictive learning. arXiv:2103.09504

  28. Yan J, Qin G, Zhao R, Liang Y, Xu Q (2019) IEEE Access. Mixpred: video prediction beyond optical flow 7:185654–185665. https://doi.org/10.1109/ACCESS.2019.2961383

    Article  Google Scholar 

  29. Liu Z, Yeh RA, Tang X, Liu Y (2017) Video frame synthesis using deep voxel flow. In: Proceedings of IEEE international conference computer vision (CVPR), Oct 2017, pp 4463–4471

    Google Scholar 

  30. Wang Y, Jiang L, Yang M-H, Li L-J, Long M, Fei-Fei L (2019) Eidetic 3d LSTM: a model for video prediction and beyond. In: ICLR

    Google Scholar 

  31. Vondrick C, Pirsiavash H, Torralba A (2016) Generating videos with scene dynamics. In: NeurIPS

    Google Scholar 

  32. Tulyakov S, Liu M-Y, Yang X, Kautz J (2018) MoCoGAN: de-composing motion and content for video generation. In: CVPR

    Google Scholar 

  33. Aigner S, Körner M (2018) Futuregan: anticipating the future frames of video sequences using spatio-temporal 3d convolutions in progressively growing autoencoder GANs. arXiv:1810.01325

  34. Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A, Bengio Y (2014) Generative adversarial nets. In: Proceedings of advances neural information processing systems conference 2014, pp 2672–2680

    Google Scholar 

  35. Kwon Y-H, Park M-G (2019) Predicting future frames using retrospective cycle GAN. In: CVPR

    Google Scholar 

  36. Oprea S, Martinez-Gonzalez P, Garcia-Garcia A, Castro-Vargas JA, Orts-Escolano S, Garcia-Rodriguez J, Argyros A (2020) A review on deep learning techniques for video prediction. arXiv:2004.05214

  37. Tulyakov S, Liu M-Y, Yang X, Kautz J (2018) MoCoGAN: decomposing motion and content for video generation. In: CVPR

    Google Scholar 

  38. Hochreiter S, Schmidhuber J (1997) Neural Comput. Long short-term memory 9(8):1735–1780

    Google Scholar 

  39. Cances J, Meghdadi V (2000) Annales Des Telecommun. Joint channel estimation and data demodulation algorithms for fast time varying band limited frequency selective Rayleigh fading channels: a comparison study 55(56):226–237

    Google Scholar 

  40. Schuldt C, Laptev I, Caputo B (2004) Recognizing human actions: a local SVM approach, vol 3. In: ICPR, pp 32–36

    Google Scholar 

  41. Srivastava N, Mansimov E, Salakhutdinov R (2015) Unsupervised learning of video representations using LSTMs. In: ICML

    Google Scholar 

  42. Horé A, Ziou D (2010) Image quality metrics: PSNR versus SSIM. In: 2010 20th International conference on pattern recognition, Istanbul, pp 2366–2369. https://doi.org/10.1109/ICPR.2010.579

  43. Kingma D, Ba J (2015) Adam: a method for stochastic optimization. In: ICLR

    Google Scholar 

Download references

Acknowledgements

This work was supported in part by the sensors generation project of Nouvelle Aquitaine region (2018-1R50214).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Wael Saideni .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Saideni, W., Helbert, D., Courreges, F., Cances, J.P. (2023). A Novel Video Prediction Algorithm Based on Robust Spatiotemporal Convolutional Long Short-Term Memory (Robust-ST-ConvLSTM). In: Yang, XS., Sherratt, S., Dey, N., Joshi, A. (eds) Proceedings of Seventh International Congress on Information and Communication Technology. Lecture Notes in Networks and Systems, vol 448. Springer, Singapore. https://doi.org/10.1007/978-981-19-1610-6_17

Download citation

  • DOI: https://doi.org/10.1007/978-981-19-1610-6_17

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-19-1609-0

  • Online ISBN: 978-981-19-1610-6

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics