Skip to main content
Log in

Video prediction based on spatial information transfer and time backtracking

  • Original Paper
  • Published:
Signal, Image and Video Processing Aims and scope Submit manuscript

Abstract

Deep learning based video prediction is challenging. The prediction network fails to make use of the useful information of each network layer and cannot establish a backtracking mechanism at present. A novel video prediction based on spatial information transfer and time backtracking (SITB) has been proposed. In order to transfer useful information to the next moment, the developed SITB network adaptively allocates weights according to the contribution of spatial information at each layer network. At the same time, a time backtracking mechanism is embedded in the network to correct the prediction error through feedback and reduce the prediction error of further future video frames. It helps the network to capture the long-term spatiotemporal change trend and enhances the spatiotemporal prediction ability of the network. The network loss function of backtracking mechanism is constructed by combining both forward and backward predictions. The proposed method is tested at some challenging datasets with vastly different practical meanings. The experimental results show that the developed method has excellent performance by comparisons with some state-of-the-art ones.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13

Similar content being viewed by others

References

  1. Shi, X., Chen, Z., Wang, H. et al. Convolutional LSTM network: a machine learning approach for precipitation nowcasting. In: Proceedings of Advances in Neural Information Processing Systems, pp. 802–810 (2015)

  2. Samsi, S., Mattioli, C., Veillette, M. Distributed deep learning for precipitation nowcasting. In: Proceedings of IEEE High Performance Extreme Computing Conference, pp. 1–7 (2019)

  3. Li, Y., Cai, Y., Li, J., et al.: Spatio-temporal unity networking for video anomaly detection. IEEE Access 7(1), 172425–172432 (2019)

    Article  Google Scholar 

  4. Tang, Y., Zhao, L., Zhang, S., et al.: Integrating prediction and reconstruction for anomaly detection. Pattern Recogn. Lett. 129(1), 123–130 (2020)

    Article  Google Scholar 

  5. Hosseini, M., Maida, A., Hosseini, M. et al. Inception LSTM for next-frame video prediction (student abstract). In: Proceedings of the AAAI Conference on Artificial Intelligence, pp. 13809–13810 (2020)

  6. Wu, Y., Gao, R., Park, J. et al. Future video synthesis with object motion prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5539–5548 (2020)

  7. Xue, J., Fang, J., Zhang, P.: A survey of scene understanding by event reasoning in autonomous driving. Int. J. Autom. Comput. 15(3), 249–266 (2018)

    Article  Google Scholar 

  8. Yuan, Y., Lin, L.: Self-supervised pre-training of transformers for satellite image time series classification. IEEE J. Sel. Top. Appl. Earth Observ. Remote Sens. 1(14), 474–487 (2020)

    Google Scholar 

  9. Ma, X., Geng, J., Wang, H.: Hyperspectral image classification via contextual deep learning. EURASIP J. Image Video Process. 20(1), 1–12 (2015)

    Google Scholar 

  10. Alotaibi, M., Alotaibi, B.: Distracted driver classification using deep learning. SIViP 14(1), 617–624 (2020)

    Article  Google Scholar 

  11. Varga, D., Szirányi, T.: No-reference video quality assessment via pretrained CNN and LSTM networks. SIViP 13(8), 1569–1576 (2019)

    Article  Google Scholar 

  12. Hesamian, M., Jia, W., He, X., et al.: Deep learning techniques for medical image segmentation: achievements and challenges. J. Digit. Imaging 32(4), 582–596 (2019)

    Article  Google Scholar 

  13. Domingues, I., Pereira, G., Martins, P., et al.: Using deep learning techniques in medical imaging: a systematic review of applications on CT and PET. Artif. Intell. Rev. 53(6), 4093–4160 (2020)

    Article  Google Scholar 

  14. Kusunose, K., Hirata, Y., Tsuji, T., et al.: Deep learning to predict elevated pulmonary artery pressure in patients with suspected pulmonary hypertension using standard chest X-ray. Sci. Rep. 10(1), 1–8 (2020)

    Article  Google Scholar 

  15. Yao, J., Ye, Y.: The effect of image recognition traffic prediction method under deep learning and naive Bayes algorithm on freeway traffic safety. Image Vis. Comput. 1(103), 1–15 (2020)

    Google Scholar 

  16. El-Dalahmeh, M., Al-Greer, M.: Time-frequency image analysis and transfer learning for capacity prediction of lithium-ion batteries. Energies 13(20), 1–19 (2020)

    Article  Google Scholar 

  17. Rumelhart, D., Hinton, G., Williams, R.: Learning representations by back-propagating errors. Nature 323(1), 533–536 (1986)

    Article  MATH  Google Scholar 

  18. Sundermeyer, M., Ney, H., Schlüter, R.: From feedforward to recurrent LSTM neural networks for language modeling. IEEE/ACM Trans. Audio Speech Lang. Process. 23(3), 517–529 (2015)

    Article  Google Scholar 

  19. Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)

    Article  Google Scholar 

  20. Srivastava, N., Mansimov, E., Salakhudinov, R. Unsupervised learning of video representations using LSTM. In: Proceedings of the International Conference on Machine Learning, pp. 843–852 (2015)

  21. Wang, Y., Long, M., Wang, J. et al. Predrnn: recurrent neural networks for predictive learning using spatiotemporal LSTMs. In: Proceedings of the 31st International Conference on Neural Information Processing Systems, pp. 879–888 (2017)

  22. Wang, Y., Gao, Z., Long, M. et al. Predrnn++: towards a resolution of the deep-in-time dilemma in spatiotemporal predictive learning. In: Proceedings of the International Conference on Machine Learning, pp. 5123–5132 (2018)

  23. Zhu, J., Park, T., Isola, P. et al. Unpaired image-to-image translation using cycle-consistent adversarial networks. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2223–2232 (2017)

  24. Yi, Z., Zhang, H., Tan, P. et al. Dualgan: unsupervised dual learning for image-to-image translation. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2849–2857 (2017)

  25. Jin, C., Yu, H., Ke, J., et al.: Predicting treatment response from longitudinal images using multi-task deep learning. Nat. Commun. 12(1), 1–11 (2021)

    Article  Google Scholar 

  26. Zilly, J., Srivastava, R., Koutnık, J. et al. Recurrent highway networks. In: Proceedings of the International Conference on Machine Learning, pp. 4189–4198 (2017)

  27. Roy, K., Mukherjee, J.: Image similarity measure using color histogram, color coherence vector, and Sobel method. Int. J. Sci. Res. 2(1), 538–543 (2013)

    Google Scholar 

  28. Li, Q., Li, K., You, X., et al.: Place recognition based on deep feature and adaptive weighting of similarity matrix. Neurocomputing 199(1), 114–127 (2016)

    Article  MathSciNet  Google Scholar 

  29. Yang, X., Zhang, Y., Li, T., et al.: Image super-resolution based on the down-sampling iterative module and deep CNN. Circuits Syst. Signal Process. 1(1), 1–19 (2021)

    Article  Google Scholar 

  30. Cummins, M., Newman, P.: Fab-map: probabilistic localization and mapping in the space of appearance. Int. J. Robot. Res. 27(6), 647–665 (2008)

    Article  Google Scholar 

  31. Kwon, Y., Park, M. Predicting future frames using retrospective cycle GAN. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1811–1820 (2019)

  32. Zhang, J., Zheng, Y., Qi, D. Deep spatio-temporal residual networks for citywide crowd flows prediction. In: Proceedings of AAAI Conference on Artificial Intelligence, pp. 1655–1661 (2017)

  33. National Meteorological Information Center. http://data.cma.cn/

  34. Oliu, M., Selva, J., Escalera, S. Folded recurrent neural networks for future video prediction. In: Proceedings of the European Conference on Computer Vision, pp. 716–731 (2018)

  35. Wang, Y., Zhang, J., Zhu, H. et al. Memory in memory: a predictive neural network for learning higher-order non-stationarity from spatiotemporal dynamics. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9146–9154 (2019)

Download references

Funding

This work is supported in part by National Key R&D Program of China (Grant Nos. 2019YFC1520500, 2020YFC1523004).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yepeng Guan.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Yuan, P., Guan, Y. & Huang, J. Video prediction based on spatial information transfer and time backtracking. SIViP 16, 825–833 (2022). https://doi.org/10.1007/s11760-021-02023-z

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11760-021-02023-z

Keywords

Navigation