Video prediction based on spatial information transfer and time backtracking

Yuan, Peng; Guan, Yepeng; Huang, Jizhong

doi:10.1007/s11760-021-02023-z

Video prediction based on spatial information transfer and time backtracking

Original Paper
Published: 07 March 2022

Volume 16, pages 825–833, (2022)
Cite this article

Signal, Image and Video Processing Aims and scope Submit manuscript

Peng Yuan¹,
Yepeng Guan^1,2 &
Jizhong Huang³

267 Accesses
1 Citation
1 Altmetric
Explore all metrics

Abstract

Deep learning based video prediction is challenging. The prediction network fails to make use of the useful information of each network layer and cannot establish a backtracking mechanism at present. A novel video prediction based on spatial information transfer and time backtracking (SITB) has been proposed. In order to transfer useful information to the next moment, the developed SITB network adaptively allocates weights according to the contribution of spatial information at each layer network. At the same time, a time backtracking mechanism is embedded in the network to correct the prediction error through feedback and reduce the prediction error of further future video frames. It helps the network to capture the long-term spatiotemporal change trend and enhances the spatiotemporal prediction ability of the network. The network loss function of backtracking mechanism is constructed by combining both forward and backward predictions. The proposed method is tested at some challenging datasets with vastly different practical meanings. The experimental results show that the developed method has excellent performance by comparisons with some state-of-the-art ones.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Unidirectional Video Denoising by Mimicking Backward Recurrent Modules with Look-Ahead Forward Ones

A Novel Video Prediction Algorithm Based on Robust Spatiotemporal Convolutional Long Short-Term Memory (Robust-ST-ConvLSTM)

Video event classification based on two-stage neural network

Article 06 May 2020

References

Shi, X., Chen, Z., Wang, H. et al. Convolutional LSTM network: a machine learning approach for precipitation nowcasting. In: Proceedings of Advances in Neural Information Processing Systems, pp. 802–810 (2015)
Samsi, S., Mattioli, C., Veillette, M. Distributed deep learning for precipitation nowcasting. In: Proceedings of IEEE High Performance Extreme Computing Conference, pp. 1–7 (2019)
Li, Y., Cai, Y., Li, J., et al.: Spatio-temporal unity networking for video anomaly detection. IEEE Access 7(1), 172425–172432 (2019)
Article Google Scholar
Tang, Y., Zhao, L., Zhang, S., et al.: Integrating prediction and reconstruction for anomaly detection. Pattern Recogn. Lett. 129(1), 123–130 (2020)
Article Google Scholar
Hosseini, M., Maida, A., Hosseini, M. et al. Inception LSTM for next-frame video prediction (student abstract). In: Proceedings of the AAAI Conference on Artificial Intelligence, pp. 13809–13810 (2020)
Wu, Y., Gao, R., Park, J. et al. Future video synthesis with object motion prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5539–5548 (2020)
Xue, J., Fang, J., Zhang, P.: A survey of scene understanding by event reasoning in autonomous driving. Int. J. Autom. Comput. 15(3), 249–266 (2018)
Article Google Scholar
Yuan, Y., Lin, L.: Self-supervised pre-training of transformers for satellite image time series classification. IEEE J. Sel. Top. Appl. Earth Observ. Remote Sens. 1(14), 474–487 (2020)
Google Scholar
Ma, X., Geng, J., Wang, H.: Hyperspectral image classification via contextual deep learning. EURASIP J. Image Video Process. 20(1), 1–12 (2015)
Google Scholar
Alotaibi, M., Alotaibi, B.: Distracted driver classification using deep learning. SIViP 14(1), 617–624 (2020)
Article Google Scholar
Varga, D., Szirányi, T.: No-reference video quality assessment via pretrained CNN and LSTM networks. SIViP 13(8), 1569–1576 (2019)
Article Google Scholar
Hesamian, M., Jia, W., He, X., et al.: Deep learning techniques for medical image segmentation: achievements and challenges. J. Digit. Imaging 32(4), 582–596 (2019)
Article Google Scholar
Domingues, I., Pereira, G., Martins, P., et al.: Using deep learning techniques in medical imaging: a systematic review of applications on CT and PET. Artif. Intell. Rev. 53(6), 4093–4160 (2020)
Article Google Scholar
Kusunose, K., Hirata, Y., Tsuji, T., et al.: Deep learning to predict elevated pulmonary artery pressure in patients with suspected pulmonary hypertension using standard chest X-ray. Sci. Rep. 10(1), 1–8 (2020)
Article Google Scholar
Yao, J., Ye, Y.: The effect of image recognition traffic prediction method under deep learning and naive Bayes algorithm on freeway traffic safety. Image Vis. Comput. 1(103), 1–15 (2020)
Google Scholar
El-Dalahmeh, M., Al-Greer, M.: Time-frequency image analysis and transfer learning for capacity prediction of lithium-ion batteries. Energies 13(20), 1–19 (2020)
Article Google Scholar
Rumelhart, D., Hinton, G., Williams, R.: Learning representations by back-propagating errors. Nature 323(1), 533–536 (1986)
Article MATH Google Scholar
Sundermeyer, M., Ney, H., Schlüter, R.: From feedforward to recurrent LSTM neural networks for language modeling. IEEE/ACM Trans. Audio Speech Lang. Process. 23(3), 517–529 (2015)
Article Google Scholar
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)
Article Google Scholar
Srivastava, N., Mansimov, E., Salakhudinov, R. Unsupervised learning of video representations using LSTM. In: Proceedings of the International Conference on Machine Learning, pp. 843–852 (2015)
Wang, Y., Long, M., Wang, J. et al. Predrnn: recurrent neural networks for predictive learning using spatiotemporal LSTMs. In: Proceedings of the 31st International Conference on Neural Information Processing Systems, pp. 879–888 (2017)
Wang, Y., Gao, Z., Long, M. et al. Predrnn++: towards a resolution of the deep-in-time dilemma in spatiotemporal predictive learning. In: Proceedings of the International Conference on Machine Learning, pp. 5123–5132 (2018)
Zhu, J., Park, T., Isola, P. et al. Unpaired image-to-image translation using cycle-consistent adversarial networks. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2223–2232 (2017)
Yi, Z., Zhang, H., Tan, P. et al. Dualgan: unsupervised dual learning for image-to-image translation. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2849–2857 (2017)
Jin, C., Yu, H., Ke, J., et al.: Predicting treatment response from longitudinal images using multi-task deep learning. Nat. Commun. 12(1), 1–11 (2021)
Article Google Scholar
Zilly, J., Srivastava, R., Koutnık, J. et al. Recurrent highway networks. In: Proceedings of the International Conference on Machine Learning, pp. 4189–4198 (2017)
Roy, K., Mukherjee, J.: Image similarity measure using color histogram, color coherence vector, and Sobel method. Int. J. Sci. Res. 2(1), 538–543 (2013)
Google Scholar
Li, Q., Li, K., You, X., et al.: Place recognition based on deep feature and adaptive weighting of similarity matrix. Neurocomputing 199(1), 114–127 (2016)
Article MathSciNet Google Scholar
Yang, X., Zhang, Y., Li, T., et al.: Image super-resolution based on the down-sampling iterative module and deep CNN. Circuits Syst. Signal Process. 1(1), 1–19 (2021)
Article Google Scholar
Cummins, M., Newman, P.: Fab-map: probabilistic localization and mapping in the space of appearance. Int. J. Robot. Res. 27(6), 647–665 (2008)
Article Google Scholar
Kwon, Y., Park, M. Predicting future frames using retrospective cycle GAN. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1811–1820 (2019)
Zhang, J., Zheng, Y., Qi, D. Deep spatio-temporal residual networks for citywide crowd flows prediction. In: Proceedings of AAAI Conference on Artificial Intelligence, pp. 1655–1661 (2017)
National Meteorological Information Center. http://data.cma.cn/
Oliu, M., Selva, J., Escalera, S. Folded recurrent neural networks for future video prediction. In: Proceedings of the European Conference on Computer Vision, pp. 716–731 (2018)
Wang, Y., Zhang, J., Zhu, H. et al. Memory in memory: a predictive neural network for learning higher-order non-stationarity from spatiotemporal dynamics. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9146–9154 (2019)

Download references

Funding

This work is supported in part by National Key R&D Program of China (Grant Nos. 2019YFC1520500, 2020YFC1523004).

Author information

Authors and Affiliations

School of Communication and Information Engineering, Shanghai University, Shanghai, 200444, China
Peng Yuan & Yepeng Guan
Key Laboratory of Advanced Display and System Application, Ministry of Education, Shanghai, 200072, China
Yepeng Guan
Institute for the Conservation of Cultural Heritage, Shanghai University, Shanghai, 200444, China
Jizhong Huang

Authors

Peng Yuan
View author publications
You can also search for this author in PubMed Google Scholar
Yepeng Guan
View author publications
You can also search for this author in PubMed Google Scholar
Jizhong Huang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yepeng Guan.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Yuan, P., Guan, Y. & Huang, J. Video prediction based on spatial information transfer and time backtracking. SIViP 16, 825–833 (2022). https://doi.org/10.1007/s11760-021-02023-z

Download citation

Received: 20 May 2021
Revised: 16 July 2021
Accepted: 01 September 2021
Published: 07 March 2022
Issue Date: April 2022
DOI: https://doi.org/10.1007/s11760-021-02023-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Video prediction based on spatial information transfer and time backtracking

Abstract

Access this article

Similar content being viewed by others

Unidirectional Video Denoising by Mimicking Backward Recurrent Modules with Look-Ahead Forward Ones

A Novel Video Prediction Algorithm Based on Robust Spatiotemporal Convolutional Long Short-Term Memory (Robust-ST-ConvLSTM)

Video event classification based on two-stage neural network

References

Funding

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Video prediction based on spatial information transfer and time backtracking

Abstract

Access this article

Similar content being viewed by others

Unidirectional Video Denoising by Mimicking Backward Recurrent Modules with Look-Ahead Forward Ones

A Novel Video Prediction Algorithm Based on Robust Spatiotemporal Convolutional Long Short-Term Memory (Robust-ST-ConvLSTM)

Video event classification based on two-stage neural network

References

Funding

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation