Semantic Segmentation of Earth Observation Data Using Multimodal and Multi-scale Deep Networks

Audebert, Nicolas; Le Saux, Bertrand; Lefèvre, Sébastien

doi:10.1007/978-3-319-54181-5_12

Nicolas Audebert^17,18,
Bertrand Le Saux¹⁷ &
Sébastien Lefèvre¹⁸

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 10111))

Included in the following conference series:

Asian Conference on Computer Vision

5339 Accesses
108 Citations
6 Altmetric

Abstract

This work investigates the use of deep fully convolutional neural networks (DFCNN) for pixel-wise scene labeling of Earth Observation images. Especially, we train a variant of the SegNet architecture on remote sensing data over an urban area and study different strategies for performing accurate semantic segmentation. Our contributions are the following: (1) we transfer efficiently a DFCNN from generic everyday images to remote sensing images; (2) we introduce a multi-kernel convolutional layer for fast aggregation of predictions at multiple scales; (3) we perform data fusion from heterogeneous sensors (optical and laser) using residual correction. Our framework improves state-of-the-art accuracy on the ISPRS Vaihingen 2D Semantic Labeling dataset.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
In this benchmark, the evaluation is not performed by us or any other competing team, but directly by the benchmark organizers.
2.
http://www2.isprs.org/vaihingen-2d-semantic-labeling-contest.html.
3.
“ONE_6”: https://www.itc.nl/external/ISPRS_WGIII4/ISPRSIII_4_Test_results/2D_labeling_vaih/2D_labeling_Vaih_details_ONE_6/index.html.
4.
“ONE_7”: https://www.itc.nl/external/ISPRS_WGIII4/ISPRSIII_4_Test_results/2D_labeling_vaih/2D_labeling_Vaih_details_ONE_7/index.html.

References

Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3431–3440 (2015)
Google Scholar
Everingham, M., Eslami, S.M.A., Gool, L.V., Williams, C.K.I., Winn, J., Zisserman, A.: The pascal visual object classes challenge: a retrospective. Int. J. Comput. Vis. 111, 98–136 (2014)
Article Google Scholar
Lin, T.-Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., Zitnick, C.L.: Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 740–755. Springer, Heidelberg (2014). doi:10.1007/978-3-319-10602-1_48
Google Scholar
Lagrange, A., Le Saux, B., Beaupere, A., Boulch, A., Chan-Hon-Tong, A., Herbin, S., Randrianarivo, H., Ferecatu, M.: Benchmarking classification of earth-observation data: from learning explicit features to convolutional networks. In: IEEE International Geosciences and Remote Sensing Symposium (IGARSS), pp. 4173–4176 (2015)
Google Scholar
Paisitkriangkrai, S., Sherrah, J., Janney, P., Van Den Hengel, A.: Effective semantic pixel labelling with convolutional networks and conditional random fields. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 36–43 (2015)
Google Scholar
Rottensteiner, F., Sohn, G., Jung, J., Gerke, M., Baillard, C., Benitez, S., Breitkopf, U.: The ISPRS benchmark on urban object classification and 3d building reconstruction. ISPRS Ann. Photogrammetry Remote Sens. Spat. Inf. Sci. 1, 3 (2012)
Google Scholar
Chen, L.C., Papandreou, G., Kokkinos, I., Murphy, K., Yuille, A.: Semantic image segmentation with deep convolutional nets and fully connected CRFs. In: Proceedings of the International Conference on Learning Representations (2015)
Google Scholar
Yu, F., Koltun, V.: Multi-scale context aggregation by dilated convolutions. In: Proceedings of the International Conference on Learning Representations (2015)
Google Scholar
Zheng, S., Jayasumana, S., Romera-Paredes, B., Vineet, V., Su, Z., Du, D., Huang, C., Torr, P.H.S.: Conditional random fields as recurrent neural networks. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1529–1537 (2015)
Google Scholar
Arnab, A., Jayasumana, S., Zheng, S., Torr, P.: Higher order conditional random fields in deep neural networks (2015). arXiv:1511.08119 [cs]
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2016)
Google Scholar
Wu, Z., Shen, C., Van Den Hengel, A.: High-performance semantic segmentation using very deep fully convolutional networks (2016). arXiv:1604.04339 [cs]
Yan, Z., Zhang, H., Jia, Y., Breuel, T., Yu, Y.: Combining the best of convolutional layers and recurrent layers: a hybrid network for semantic segmentation. arXiv:1603.04871 [cs] (2016)
Zhao, J., Mathieu, M., Goroshin, R., LeCun, Y.: Stacked what-where auto-encoders. In: Proceedings of the International Conference on Learning Representations (2015)
Google Scholar
Noh, H., Hong, S., Han, B.: Learning deconvolution network for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1520–1528 (2015)
Google Scholar
Badrinarayanan, V., Kendall, A., Cipolla, R.: SegNet: a deep convolutional encoder-decoder architecture for image segmentation. arXiv preprint arXiv:1511.00561 (2015)
Mnih, V., Hinton, G.E.: Learning to detect roads in high-resolution aerial images. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010. LNCS, vol. 6316, pp. 210–223. Springer, Heidelberg (2010). doi:10.1007/978-3-642-15567-3_16
Chapter Google Scholar
Penatti, O., Nogueira, K., Dos Santos, J.: Do deep features generalize from everyday objects to remote sensing and aerial scenes domains? In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 44–51 (2015)
Google Scholar
Campos-Taberner, M., Romero-Soriano, A., Gatta, C., Camps-Valls, G., Lagrange, A., Le Saux, B., Beaupère, A., Boulch, A., Chan-Hon-Tong, A., Herbin, S., Randrianarivo, H., Ferecatu, M., Shimoni, M., Moser, G., Tuia, D.: Processing of extremely high-resolution LiDAR and RGB data: outcome of the 2015 IEEE GRSS data fusion contest part A: 2-D contest. IEEE J. Sel. Topics Appl. Earth Obs. Remote Sens. PP, 1–13 (2016)
Google Scholar
Nogueira, K., Penatti, O.A.B., Dos Santos, J.A.: Towards better exploiting convolutional neural networks for remote sensing scene classification. arXiv:1602.01517 [cs] (2016)
Zhao, W., Du, S.: Learning multiscale and deep representations for classifying remotely sensed imagery. ISPRS J. Photogrammetry Remote Sens. 113, 155–165 (2016)
Article Google Scholar
Marmanis, D., Wegner, J.D., Galliani, S., Schindler, K., Datcu, M., Stilla, U.: Semantic segmentation of aerial images with an ensemble of CNNs. ISPRS Ann. Photogrammetry Remote Sens. Spat. Inf. Sci. 3, 473–480 (2016)
Article Google Scholar
Gerke, M.: Use of the stair vision library within the ISPRS 2d semantic labeling benchmark (Vaihingen). Technical report, International Institute for Geo-Information Science and Earth Observation (2015)
Google Scholar
Chatfield, K., Simonyan, K., Vedaldi, A., Zisserman, A.: Return of the devil in the details: delving deep into convolutional nets. In: Proceedings of the British Machine Vision Conference, pp. 6.1–6.12. British Machine Vision Association (2014)
Google Scholar
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556 [cs] (2014)
Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. In: Proceedings of the 32nd International Conference on Machine Learning, pp. 448–456 (2015)
Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Delving deep into rectifiers: surpassing human-level performance on imagenet classification. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1026–1034 (2015)
Google Scholar
Clevert, D.A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (ELUs). In: Proceedings of the International Conference on Learning Representations (2015)
Google Scholar
Marmanis, D., Datcu, M., Esch, T., Stilla, U.: Deep learning earth observation classification using imagenet pretrained networks. IEEE Geosci. Remote Sens. Lett. 13, 105–109 (2016)
Article Google Scholar
Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Going deeper with convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–9 (2015)
Google Scholar
Liao, R., Tao, X., Li, R., Ma, Z., Jia, J.: Video super-resolution via deep draft-ensemble learning. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 531–539 (2015)
Google Scholar
Liao, Z., Carneiro, G.: Competitive multi-scale convolution. arXiv:1511.05635 [cs] (2015)
Ngiam, J., Khosla, A., Kim, M., Nam, J., Lee, H., Ng, A.Y.: Multimodal deep learning. In: Proceedings of the 28th international conference on machine learning (ICML 2011), pp. 689–696 (2011)
Google Scholar
Eitel, A., Springenberg, J.T., Spinello, L., Riedmiller, M., Burgard, W.: Multimodal deep learning for robust RGB-D object recognition. In: Proceedings of the International Conference on Intelligent Robots and Systems, pp. 681–687. IEEE (2015)
Google Scholar
Quang, N.T., Thuy, N.T., Sang, D.V., Binh, H.T.T.: An efficient framework for pixel-wise building segmentation from aerial images. In: Proceedings of the Sixth International Symposium on Information and Communication Technology, p. 43. ACM (2015)
Google Scholar
Boulch, A.: DAG of convolutional networks for semantic labeling. Technical report, Office national d’études et de recherchesaérospatiales (2015)
Google Scholar
Lin, G., Shen, C., Van Den Hengel, A., Reid, I.: Efficient piecewise training of deep structured models for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2015)
Google Scholar
Yosinski, J., Clune, J., Bengio, Y., Lipson, H.: How transferable are features in deep neural networks? In: Advances in Neural Information Processing Systems, pp. 3320–3328 (2014)
Google Scholar
Cramer, M.: The DGPF test on digital aerial camera evaluation - overview and test design. Photogrammetrie - Fernerkundung - Geoinformation 2, 73–82 (2010)
Article Google Scholar

Download references

Acknowledgement

The Vaihingen data set was provided by the German Society for Photogrammetry, Remote Sensing and Geoinformation (DGPF) [39]: http://www.ifp.uni-stuttgart.de/dgpf/DKEP-Allg.html.

Nicolas Audebert’s work is supported by the Total-ONERA research project NAOMI. The authors acknowledge the support of the French Agence Nationale de la Recherche (ANR) under reference ANR-13-JS02-0005-01 (Asterix project).

Author information

Authors and Affiliations

The French Aerospace Lab, ONERA, 91761, Palaiseau, France
Nicolas Audebert & Bertrand Le Saux
Univ. Bretagne-Sud, UMR 6074, IRISA, 56000, Vannes, France
Nicolas Audebert & Sébastien Lefèvre

Authors

Nicolas Audebert
View author publications
You can also search for this author in PubMed Google Scholar
Bertrand Le Saux
View author publications
You can also search for this author in PubMed Google Scholar
Sébastien Lefèvre
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Nicolas Audebert .

Editor information

Editors and Affiliations

National Tsing Hua University, Hsinchu, Taiwan
Shang-Hong Lai
Graz University of Technology, Graz, Austria
Vincent Lepetit
Drexel University, Philadelphia, Pennsylvania, USA
Ko Nishino
The University of Tokyo, Tokyo, Japan
Yoichi Sato

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Audebert, N., Le Saux, B., Lefèvre, S. (2017). Semantic Segmentation of Earth Observation Data Using Multimodal and Multi-scale Deep Networks. In: Lai, SH., Lepetit, V., Nishino, K., Sato, Y. (eds) Computer Vision – ACCV 2016. ACCV 2016. Lecture Notes in Computer Science(), vol 10111. Springer, Cham. https://doi.org/10.1007/978-3-319-54181-5_12

Download citation

DOI: https://doi.org/10.1007/978-3-319-54181-5_12
Published: 10 March 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-54180-8
Online ISBN: 978-3-319-54181-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics