Abstract
Deep learning based video prediction is challenging. The prediction network fails to make use of the useful information of each network layer and cannot establish a backtracking mechanism at present. A novel video prediction based on spatial information transfer and time backtracking (SITB) has been proposed. In order to transfer useful information to the next moment, the developed SITB network adaptively allocates weights according to the contribution of spatial information at each layer network. At the same time, a time backtracking mechanism is embedded in the network to correct the prediction error through feedback and reduce the prediction error of further future video frames. It helps the network to capture the long-term spatiotemporal change trend and enhances the spatiotemporal prediction ability of the network. The network loss function of backtracking mechanism is constructed by combining both forward and backward predictions. The proposed method is tested at some challenging datasets with vastly different practical meanings. The experimental results show that the developed method has excellent performance by comparisons with some state-of-the-art ones.
Similar content being viewed by others
References
Shi, X., Chen, Z., Wang, H. et al. Convolutional LSTM network: a machine learning approach for precipitation nowcasting. In: Proceedings of Advances in Neural Information Processing Systems, pp. 802–810 (2015)
Samsi, S., Mattioli, C., Veillette, M. Distributed deep learning for precipitation nowcasting. In: Proceedings of IEEE High Performance Extreme Computing Conference, pp. 1–7 (2019)
Li, Y., Cai, Y., Li, J., et al.: Spatio-temporal unity networking for video anomaly detection. IEEE Access 7(1), 172425–172432 (2019)
Tang, Y., Zhao, L., Zhang, S., et al.: Integrating prediction and reconstruction for anomaly detection. Pattern Recogn. Lett. 129(1), 123–130 (2020)
Hosseini, M., Maida, A., Hosseini, M. et al. Inception LSTM for next-frame video prediction (student abstract). In: Proceedings of the AAAI Conference on Artificial Intelligence, pp. 13809–13810 (2020)
Wu, Y., Gao, R., Park, J. et al. Future video synthesis with object motion prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5539–5548 (2020)
Xue, J., Fang, J., Zhang, P.: A survey of scene understanding by event reasoning in autonomous driving. Int. J. Autom. Comput. 15(3), 249–266 (2018)
Yuan, Y., Lin, L.: Self-supervised pre-training of transformers for satellite image time series classification. IEEE J. Sel. Top. Appl. Earth Observ. Remote Sens. 1(14), 474–487 (2020)
Ma, X., Geng, J., Wang, H.: Hyperspectral image classification via contextual deep learning. EURASIP J. Image Video Process. 20(1), 1–12 (2015)
Alotaibi, M., Alotaibi, B.: Distracted driver classification using deep learning. SIViP 14(1), 617–624 (2020)
Varga, D., Szirányi, T.: No-reference video quality assessment via pretrained CNN and LSTM networks. SIViP 13(8), 1569–1576 (2019)
Hesamian, M., Jia, W., He, X., et al.: Deep learning techniques for medical image segmentation: achievements and challenges. J. Digit. Imaging 32(4), 582–596 (2019)
Domingues, I., Pereira, G., Martins, P., et al.: Using deep learning techniques in medical imaging: a systematic review of applications on CT and PET. Artif. Intell. Rev. 53(6), 4093–4160 (2020)
Kusunose, K., Hirata, Y., Tsuji, T., et al.: Deep learning to predict elevated pulmonary artery pressure in patients with suspected pulmonary hypertension using standard chest X-ray. Sci. Rep. 10(1), 1–8 (2020)
Yao, J., Ye, Y.: The effect of image recognition traffic prediction method under deep learning and naive Bayes algorithm on freeway traffic safety. Image Vis. Comput. 1(103), 1–15 (2020)
El-Dalahmeh, M., Al-Greer, M.: Time-frequency image analysis and transfer learning for capacity prediction of lithium-ion batteries. Energies 13(20), 1–19 (2020)
Rumelhart, D., Hinton, G., Williams, R.: Learning representations by back-propagating errors. Nature 323(1), 533–536 (1986)
Sundermeyer, M., Ney, H., Schlüter, R.: From feedforward to recurrent LSTM neural networks for language modeling. IEEE/ACM Trans. Audio Speech Lang. Process. 23(3), 517–529 (2015)
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)
Srivastava, N., Mansimov, E., Salakhudinov, R. Unsupervised learning of video representations using LSTM. In: Proceedings of the International Conference on Machine Learning, pp. 843–852 (2015)
Wang, Y., Long, M., Wang, J. et al. Predrnn: recurrent neural networks for predictive learning using spatiotemporal LSTMs. In: Proceedings of the 31st International Conference on Neural Information Processing Systems, pp. 879–888 (2017)
Wang, Y., Gao, Z., Long, M. et al. Predrnn++: towards a resolution of the deep-in-time dilemma in spatiotemporal predictive learning. In: Proceedings of the International Conference on Machine Learning, pp. 5123–5132 (2018)
Zhu, J., Park, T., Isola, P. et al. Unpaired image-to-image translation using cycle-consistent adversarial networks. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2223–2232 (2017)
Yi, Z., Zhang, H., Tan, P. et al. Dualgan: unsupervised dual learning for image-to-image translation. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2849–2857 (2017)
Jin, C., Yu, H., Ke, J., et al.: Predicting treatment response from longitudinal images using multi-task deep learning. Nat. Commun. 12(1), 1–11 (2021)
Zilly, J., Srivastava, R., Koutnık, J. et al. Recurrent highway networks. In: Proceedings of the International Conference on Machine Learning, pp. 4189–4198 (2017)
Roy, K., Mukherjee, J.: Image similarity measure using color histogram, color coherence vector, and Sobel method. Int. J. Sci. Res. 2(1), 538–543 (2013)
Li, Q., Li, K., You, X., et al.: Place recognition based on deep feature and adaptive weighting of similarity matrix. Neurocomputing 199(1), 114–127 (2016)
Yang, X., Zhang, Y., Li, T., et al.: Image super-resolution based on the down-sampling iterative module and deep CNN. Circuits Syst. Signal Process. 1(1), 1–19 (2021)
Cummins, M., Newman, P.: Fab-map: probabilistic localization and mapping in the space of appearance. Int. J. Robot. Res. 27(6), 647–665 (2008)
Kwon, Y., Park, M. Predicting future frames using retrospective cycle GAN. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1811–1820 (2019)
Zhang, J., Zheng, Y., Qi, D. Deep spatio-temporal residual networks for citywide crowd flows prediction. In: Proceedings of AAAI Conference on Artificial Intelligence, pp. 1655–1661 (2017)
National Meteorological Information Center. http://data.cma.cn/
Oliu, M., Selva, J., Escalera, S. Folded recurrent neural networks for future video prediction. In: Proceedings of the European Conference on Computer Vision, pp. 716–731 (2018)
Wang, Y., Zhang, J., Zhu, H. et al. Memory in memory: a predictive neural network for learning higher-order non-stationarity from spatiotemporal dynamics. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9146–9154 (2019)
Funding
This work is supported in part by National Key R&D Program of China (Grant Nos. 2019YFC1520500, 2020YFC1523004).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Yuan, P., Guan, Y. & Huang, J. Video prediction based on spatial information transfer and time backtracking. SIViP 16, 825–833 (2022). https://doi.org/10.1007/s11760-021-02023-z
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11760-021-02023-z