Forecasting Future Instance Segmentation with Learned Optical Flow and Warping

Ciamarra, Andrea; Becattini, Federico; Seidenari, Lorenzo; Del Bimbo, Alberto

doi:10.1007/978-3-031-06433-3_30

Andrea Ciamarra¹²,
Federico Becattini¹²,
Lorenzo Seidenari¹² &
…
Alberto Del Bimbo¹²

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13233))

Included in the following conference series:

International Conference on Image Analysis and Processing

1290 Accesses
3 Altmetric

Abstract

For an autonomous vehicle it is essential to observe the ongoing dynamics of a scene and consequently predict imminent future scenarios to ensure safety to itself and others. This can be done using different sensors and modalities. In this paper we investigate the usage of optical flow for predicting future semantic segmentations. To do so we propose a model that forecasts flow fields autoregressively. Such predictions are then used to guide the inference of a learned warping function that moves instance segmentations on to future frames. Results on the Cityscapes dataset demonstrate the effectiveness of optical-flow methods.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 69.99; Price excludes VAT (USA)

Softcover Book: USD 89.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Beltrán, J., Guindel, C., Moreno, F.M., Cruzado, D., Garcia, F., De La Escalera, A.: BirdNet: a 3D object detection framework from LiDAR information. In: 2018 21st International Conference on Intelligent Transportation Systems (ITSC), pp. 3517–3523. IEEE (2018)
Google Scholar
Berlincioni, L., Becattini, F., Galteri, L., Seidenari, L., Bimbo, A.D.: Road layout understanding by generative adversarial inpainting. In: Escalera, S., Ayache, S., Wan, J., Madadi, M., Güçlü, U., Baró, X. (eds.) Inpainting and Denoising Challenges. TSSCML, pp. 111–128. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-25614-2_10
Brazil, G., Yin, X., Liu, X.: Illuminating pedestrians via simultaneous detection and segmentation. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 4950–4959 (2017)
Google Scholar
Chen, X., Kundu, K., Zhang, Z., Ma, H., Fidler, S., Urtasun, R.: Monocular 3D object detection for autonomous driving. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2147–2156 (2016)
Google Scholar
Chiu, H.k., Adeli, E., Niebles, J.C.: Segmenting the future. IEEE Robot. Autom. Lett. 5(3), 4202–4209 (2020)
Google Scholar
Cordts, M., et al.: The cityscapes dataset for semantic urban scene understanding. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3213–3223 (2016)
Google Scholar
Cuffaro, G., Becattini, F., Baecchi, C., Seidenari, L., Del Bimbo, A.: Segmentation free object discovery in video. In: Hua, G., Jégou, H. (eds.) ECCV 2016. LNCS, vol. 9915, pp. 25–31. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-49409-8_4
Graber, C., Tsai, G., Firman, M., Brostow, G., Schwing, A.G.: Panoptic segmentation forecasting. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12517–12526 (2021)
Google Scholar
He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask R-CNN. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2961–2969 (2017)
Google Scholar
Hu, J.F., Sun, J., Lin, Z., Lai, J.H., Zeng, W., Zheng, W.S.: Apanet: auto-path aggregation for future instance segmentation prediction. In: IEEE Transactions on Pattern Analysis and Machine Intelligence (2021)
Google Scholar
Ilg, E., Mayer, N., Saikia, T., Keuper, M., Dosovitskiy, A., Brox, T.: Flownet 2.0: evolution of optical flow estimation with deep networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 2462–2470 (2017)
Google Scholar
Kalchbrenner, N., et al.: Video pixel networks. In: International Conference on Machine Learning, pp. 1771–1779. PMLR (2017)
Google Scholar
Kieu, M., Bagdanov, AD., Bertini, M., del Bimbo, A.: Task-conditioned domain adaptation for pedestrian detection in thermal imagery. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12367, pp. 546–562. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58542-6_33
Kirillov, A., Girshick, R., He, K., Dollár, P.: Panoptic feature pyramid networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6399–6408 (2019)
Google Scholar
Kwon, Y.H., Park, M.G.: Predicting future frames using retrospective cycle gan. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1811–1820 (2019)
Google Scholar
Lin, T.Y., Dollár, P., Girshick, R., He, K., Hariharan, B., Belongie, S.: Feature pyramid networks for object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2117–2125 (2017)
Google Scholar
Lin, T.-Y., et al.: Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 740–755. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10602-1_48
Lin, Z., Sun, J., Hu, J.F., Yu, Q., Lai, J.H., Zheng, W.S.: Predictive feature learning for future segmentation prediction. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 7365–7374 (2021)
Google Scholar
Luc, P., Couprie, C., Lecun, Y., Verbeek, J.: Predicting future instance segmentation by forecasting convolutional features. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 584–599 (2018)
Google Scholar
Luc, P., Neverova, N., Couprie, C., Verbeek, J., LeCun, Y.: Predicting deeper into the future of semantic segmentation. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 648–657 (2017)
Google Scholar
Marchetti, F., Becattini, F., Seidenari, L., Del Bimbo, A.: Multiple trajectory prediction of moving agents with memory augmented networks. In: IEEE Transactions on Pattern Analysis and Machine Intelligence (2020)
Google Scholar
Mathieu, M., Couprie, C., LeCun, Y.: Deep multi-scale video prediction beyond mean square error. arXiv:1511.05440 (2015)
Milletari, F., Navab, N., Ahmadi, S.A.: V-net: fully convolutional neural networks for volumetric medical image segmentation. In: 2016 Fourth International Conference on 3D Vision (3DV), pp. 565–571. IEEE (2016)
Google Scholar
Oprea, S., et al.: A review on deep learning techniques for video prediction. In: IEEE Transactions on Pattern Analysis and Machine Intelligence (2020)
Google Scholar
Perot, E., de Tournemire, P., Nitti, D., Masci, J., Sironi, A.: Learning to detect objects with a 1 megapixel event camera. Adv. Neural. Inf. Process. Syst. 33, 16639–16652 (2020)
Google Scholar
Ronneberger, O., Fischer, P., Brox, T.: U-Net: convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., Wells, W.M., Frangi, Alejandro F. (eds.) MICCAI 2015. LNCS, vol. 9351, pp. 234–241. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24574-4_28
Šarić, J., Oršić, M., Antunović, T., Vražić, S., Šegvić, S.: Single level feature-to-feature forecasting with deformable convolutions. In: Fink, G.A., Frintrop, S., Jiang, X. (eds.) DAGM GCPR 2019. LNCS, vol. 11824, pp. 189–202. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-33676-9_13
Saric, J., Orsic, M., Antunovic, T., Vrazic, S., Segvic, S.: Warp to the future: joint forecasting of features and feature motion. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10648–10657 (2020)
Google Scholar
Sun, J., et al.: Predicting future instance segmentation with contextual pyramid convlstms. In: Proceedings of the 27th ACM International Conference on Multimedia, pp. 2043–2051 (2019)
Google Scholar
Terwilliger, A., Brazil, G., Liu, X.: Recurrent flow-guided semantic forecasting. In: 2019 IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 1703–1712. IEEE (2019)
Google Scholar
Villegas, R., Yang, J., Hong, S., Lin, X., Lee, H.: Decomposing motion and content for natural video sequence prediction. arXiv:1706.08033 (2017)
Wang, Y., Jiang, L., Yang, M.H., Li, L.J., Long, M., Fei-Fei, L.: Eidetic 3D LSTM: a model for video prediction and beyond. In: International Conference on Learning Representations (2018)
Google Scholar
Wu, Y., Kirillov, A., Massa, F., Lo, W.Y., Girshick, R.: Detectron2. https://github.com/facebookresearch/detectron2 (2019)
Xingjian, S., Chen, Z., Wang, H., Yeung, D.Y., Wong, W.K., Woo, W.C.: Convolutional LSTM network: a machine learning approach for precipitation nowcasting. In: Advances in Neural Information Processing Systems, pp. 802–810 (2015)
Google Scholar

Download references

Acknowledgement

This work was supported by the European Commission under European Horizon 2020 Programme, grant number 951911 - AI4Media

Author information

Authors and Affiliations

University of Florence, Florence, Italy
Andrea Ciamarra, Federico Becattini, Lorenzo Seidenari & Alberto Del Bimbo

Authors

Andrea Ciamarra
View author publications
You can also search for this author in PubMed Google Scholar
Federico Becattini
View author publications
You can also search for this author in PubMed Google Scholar
Lorenzo Seidenari
View author publications
You can also search for this author in PubMed Google Scholar
Alberto Del Bimbo
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Federico Becattini .

Editor information

Editors and Affiliations

Boston University, Boston, MA, USA
Stan Sclaroff
National Research Council, Lecce, Italy
Cosimo Distante
National Research Council, Lecce, Italy
Marco Leo
University of Catania, Catania, Italy
Giovanni M. Farinella
Technische Universität München, Garching, Germany
Federico Tombari

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ciamarra, A., Becattini, F., Seidenari, L., Del Bimbo, A. (2022). Forecasting Future Instance Segmentation with Learned Optical Flow and Warping. In: Sclaroff, S., Distante, C., Leo, M., Farinella, G.M., Tombari, F. (eds) Image Analysis and Processing – ICIAP 2022. ICIAP 2022. Lecture Notes in Computer Science, vol 13233. Springer, Cham. https://doi.org/10.1007/978-3-031-06433-3_30

Download citation

DOI: https://doi.org/10.1007/978-3-031-06433-3_30
Published: 15 May 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-06432-6
Online ISBN: 978-3-031-06433-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Forecasting Future Instance Segmentation with Learned Optical Flow and Warping