Temporal capsule networks for video motion estimation and error concealment

Sankisa, Arun; Punjabi, Arjun; Katsaggelos, Aggelos K.

doi:10.1007/s11760-020-01671-x

Temporal capsule networks for video motion estimation and error concealment

Original Paper
Published: 06 April 2020

Volume 14, pages 1369–1377, (2020)
Cite this article

Signal, Image and Video Processing Aims and scope Submit manuscript

Arun Sankisa¹,
Arjun Punjabi¹ &
Aggelos K. Katsaggelos¹

378 Accesses
6 Citations
Explore all metrics

Abstract

In this paper, we present a temporal capsule network architecture to encode motion in videos as an instantiation parameter. The extracted motion is used to perform motion-compensated error concealment. We modify the original architecture and use a carefully curated dataset to enable the training of capsules spatially and temporally. First, we add the temporal dimension by taking co-located “patches” from three consecutive frames obtained from standard video sequences to form input data “cubes.” Second, the network is designed with an initial feature extraction layer that operates on all three dimensions to generate spatiotemporal features. Additionally, we implement the PrimaryCaps module with a recurrent layer, instead of a conventional convolutional layer, to extract short-term motion-related temporal dependencies and encode them as activation vectors in the capsule output. Finally, the capsule output is combined with the most-recent past frame and passed through a fully connected reconstruction network to perform motion-compensated error concealment. We study the effectiveness of temporal capsules by comparing the proposed model with architectures that do not include capsules. Although the quality of the reconstruction shows room for improvement, we successfully demonstrate that capsules-based architectures can be designed to operate in the temporal dimension to encode motion-related attributes as instantiation parameters. The accuracy of motion estimation is evaluated by comparing both the reconstructed frame outputs and the corresponding optical flow estimates with ground truth data.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Variational approach for capsule video frame interpolation

Article Open access 01 May 2018

Learning to Extract Motion from Videos in Convolutional Neural Networks

Neural Residual Flow Fields for Efficient Video Representations

References

Le Cun, Y., Boser, B., Denker, J.S., Henderson, D., Howard, R.E., Hubbard, W., Jackel, L.D., et al.: Handwritten digit recognition with a back-propagation network. In: Advances in Neural Information Processing Systems (1990)
LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86, 2278–2324 (1998)
Article Google Scholar
Krizhenvshky, A., Sutskever, I., Hinton, G.: ImageNet classification with deep convolutional networks. In: Proceedings of the Conference Neural Information Processing Systems (NIPS), Lake Tahoe, 3–8 Dec 2012, pp. 1097–1105
Geoffrey, E., Hinton, A.K., Sida D.W.: Transforming auto-encoders. In: International Conference on Artificial Neural Networks, Springer, pp. 44–51, 2011
Lecun, Y., Bengio, Y., Hinton, G.: Deep learning. Nature 521(7553), 436–444 (2015)
Article Google Scholar
Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks”. Proc. NIPS 60, 1106–1114 (2012)
Google Scholar
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: IEEE ICLR, 2015
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of Conference on Computer Vision and Pattern Recognition, pp. 770–778, 2016
Cai, N., Su, Z., Lin, Z., Wang, H., Yang, Z., et al.: Blind inpainting using the fully convolutional neural network”. The Visual Computer 33, 1–13 (2015)
Google Scholar
Pathak, D., Krahenbuhl, P., Donahue, J., Darrell, T., Efros, A.A.: Context encoders: feature learning by inpainting. In: IEEE Conference on Computer Vision and Pattern Recognition, 2016
Varga, D., Szirányi, T.: No-reference video quality assessment via pretrained CNN and LSTM networks. SIViP 13, 1569 (2019). https://doi.org/10.1007/s11760-019-01510-8
Article Google Scholar
Simonyan, K., Zisserman, A.: Two-stream convolutional networks for action recognition in videos. In: CoRR, abs/1406.2199, Proc. NIPS, 2014
Ilg, E., Mayer, N., Saikia, T., Keuper, M., Dosovitskiy, A., Brox, T.: FlowNet 2.0: evolution of optical flow estimation with deep networks. In: IEEE Conference on Computer Vision and Pattern Recognition, 2017
Revaud, J., Weinzaepfel, P., Harchaoui, Z., Schmid, C.: Epicflow: edge-preserving interpolation of correspondences for optical flow. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 1164–1172, 2015
Weinzaepfel, P., Revaud, J., Harchaoui, Z., Schmid, C.: “Deepflow: large displacement optical flow with deep matching. In: IEEE International Conference on Computer Vision (ICCV), 2013
Villegas, R., Yang, J., Hong, S., Lin, X., Lee, H.: Decomposing motion and content for natural video sequence prediction. In: IEEE ICLR, Toulon, pp. 1–10, 24–26 Apr 2017
Xue, T. Wu, J., Bouman, K., Freeman, B.: Visual dynamics: probabilistic future frame synthesis via cross convolutional networks. In: Proc. NIPS, pp. 91–99, 2016
Salimans, T., Goodfellow, I., Zaremba, W., Cheung, V., Radford, A., Chen, X.: Improved techniques for training GANs. arXiv:1606.03498, 2016
Liang, X., Lee, L. Dai, W., Xing, E.P.: Dual motion GAN for future-flow embedded video prediction. In: ICCV, 2017
Kappeler, A., Yoo, S., Dai, Q., Katsaggelos, A.K.: Video super-resolution with convolutional neural networks. IEEE Trans. Comput. Imaging 2, 109–122 (2016)
Article MathSciNet Google Scholar
Lucas, A., Lopez-Tapia, S., Molina, R., Katsaggelos, A.K.: Generative Adversarial Networks and Perceptual Losses for Video Super-Resolution. IEEE Trans. Image Process 28, 3312–3327 (2019)
Article MathSciNet Google Scholar
Sabour, S., Frosst, N., Hinton, G.E.: Dynamic routing between capsules. In: Proc. NIPS, pp. 3857–3867, 2017
Hinton, G.E., Frosst, N., Sabour, S.: Matrix capsules with em routing. In: IEEE, International Conference on Learning Representations, 2018
Islam, K., Perez, D., Hill, V., Schaeffer, B., Zimmerman, R. Li, J.: Seagrass detection in coastal water through deep capsule networks. In: Chinese Conference on Pattern Recognition and Computer Vision, Sun-Yat Sen University, 2018
Perez, D., Islam, K.A., Schaeffer, B., Zimmerman, R., Hill, V., Li, J.: Deepcoast: quantifying seagrass distribution in coastal water through deep capsule networks. In: The First Chinese Conference on Pattern Recognition and Computer Vision (PRCV), 2018
Afshar, P., Mohammadi, A., Plataniotis, K.N.: Brain tumor type classification via capsule networks. arXiv preprint arXiv:1802.10200, 2018
Zhao, W. Ye, J., Yang, M., Lei, Z., Zhang, S., Zhao, Z.: Investigating capsule networks with dynamic routing for text classification. In: Computing Research Repository Version 1, arXiv:1804.00857
Jaiswal, A., AbdAlmageed, W., Natarajan, P.: CapsuleGAN: generative adversarial capsule network. 2018. arXiv:1802.06167
Xiang, C., Zhang, L., Tang, Y., Zou, W., Xu, C.: MS-CapsNet: a novel multi-scale capsule network. IEEE Signal Process. Lett. 25(12), 1850–1854 (2018)
Article Google Scholar
Mallea, M.D.G., Meltzer, P., Bentley, P.J.: Capsule neural networks for graph classification using explicit tensorial graph representations. arXiv preprint arXiv:1902.08399, 2019
Xiph.org Video Dataset: https://media.xiph.org/video/derf/
Soomro, K., Zamir, A.R., Shah, M.: Ucf101: a dataset of 101 human actions classes from videos in the wild. arXiv preprint arXiv:1212.0402, 2012
Sankisa, A., Pandremmenou, K., Pahalawatta, P.V., Kondi, L.P., Katsaggelos, A.K.: SSIM-based distortion estimation for optimized video transmission over inherently noisy channels. IJMDEM 7(3), 34–52 (2016)
Google Scholar
Sankisa, A., Pandremmenou, K., Kondi, L.P., Katsaggelos, A.K.: A novel cumulative distortion metric and a no-reference sparse prediction model for packet prioritization in encoded video transmission. In: Proceedings of IEEE ICIP, pp. 2097–2101, 2016
Capsule Network and Dynamic Routing implementation: https://github.com/XifengGuo/CapsNet-Keras
Zheng, J., Wang, H., Pei, B.: Robust optical flow estimation based on wavelet. SIViP 13, 1303–1310 (2019). https://doi.org/10.1007/s11760-019-01476-7
Article Google Scholar
Fischer, P., et al.: FlowNet: learning optical flow with convolutional networks. https://arxiv.org/pdf/1504.06852.pdf, 2015
Mathieu, M., Couprie, C., LeCun, Y.: Deep multi-scale video prediction beyond mean square error. In: IEEE International Conference on Learning Representations, 2016
Niklaus S., Liu, F.: Context-aware synthesis for video frame interpolation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1701–1710, 2018
Liang, X., Lee, L., Dai, W., Xing, E.P.: Dual motion gan for future-flow embedded video prediction. In: IEEE International Conference on Computer Vision, 2017
Zhang, T., Jiang, P., Zhang, M.: Inter-frame video image generation based on spatial continuity generative adversarial networks. SIViP 13, 1487 (2019). https://doi.org/10.1007/s11760-019-01499-0
Article Google Scholar
Sankisa, A., Punjabi, A., Katsaggelos, A.K.: Video error concealment using deep neural networks. In: Proceedings of IEEE ICIP, pp. 380–384, 2018

Download references

Author information

Authors and Affiliations

Department of Electrical and Computer Engineering, Northwestern University, Evanston, IL, USA
Arun Sankisa, Arjun Punjabi & Aggelos K. Katsaggelos

Authors

Arun Sankisa
View author publications
You can also search for this author in PubMed Google Scholar
Arjun Punjabi
View author publications
You can also search for this author in PubMed Google Scholar
Aggelos K. Katsaggelos
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Arun Sankisa.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Sankisa, A., Punjabi, A. & Katsaggelos, A.K. Temporal capsule networks for video motion estimation and error concealment. SIViP 14, 1369–1377 (2020). https://doi.org/10.1007/s11760-020-01671-x

Download citation

Received: 09 August 2019
Revised: 06 January 2020
Accepted: 06 March 2020
Published: 06 April 2020
Issue Date: October 2020
DOI: https://doi.org/10.1007/s11760-020-01671-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Temporal capsule networks for video motion estimation and error concealment

Abstract

Access this article

Similar content being viewed by others

Variational approach for capsule video frame interpolation

Learning to Extract Motion from Videos in Convolutional Neural Networks

Neural Residual Flow Fields for Efficient Video Representations

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Temporal capsule networks for video motion estimation and error concealment

Abstract

Access this article

Similar content being viewed by others

Variational approach for capsule video frame interpolation

Learning to Extract Motion from Videos in Convolutional Neural Networks

Neural Residual Flow Fields for Efficient Video Representations

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation