Skip to main content

Season-Invariant Semantic Segmentation with a Deep Multimodal Network

  • Conference paper
  • First Online:
Field and Service Robotics

Part of the book series: Springer Proceedings in Advanced Robotics ((SPAR,volume 5))

Abstract

Semantic scene understanding is a useful capability for autonomous vehicles operating in off-roads. While cameras are the most common sensor used for semantic classification, the performance of methods using camera imagery may suffer when there is significant variation between the train and testing sets caused by illumination, weather, and seasonal variations. On the other hand, 3D information from active sensors such as LiDAR is comparatively invariant to these factors, which motivates us to investigate whether it can be used to improve performance in this scenario. In this paper, we propose a novel multimodal Convolutional Neural Network (CNN) architecture consisting of two streams, 2D and 3D, which are fused by projecting 3D features to image space to achieve a robust pixelwise semantic segmentation. We evaluate our proposed method in a novel off-road terrain classification benchmark, and show a 25% improvement in mean Intersection over Union (IoU) of navigation-related semantic classes, relative to an image-only baseline.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 219.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    For performance reasons, we simplify the point cloud network by replacing the dilation layer and asymmetric layer with the regular convolution layer. Also, we replace the deconvolution layer with the upsample layer followed by the \(3 \times 3 \times 3\) convolutional layer with stride 1. For simplicity, we use the same term “deconvolution”.

  2. 2.

    Point cloud is represented by the 3D voxel grid as a convolutional architecture requires a regular input data format.

References

  1. Long, J., Shelhamer, E., Darrell, T.: Fully convolutional models for semantic segmentation. In: Proceedings IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2015)

    Google Scholar 

  2. Badrinarayanan, V., Kendall, A., Cipolla, R.: SegNet: A deep convolutional encoder-decoder architecture for image segmentation. arXiv:1511.00561 [cs.CV] (2015)

  3. Paszke, A., Chaurasia, A., Kim, S., Culurciello, E.: ENet: a deep neural network architecture for real-time semantic segmentation. arXiv:1606.02147 [cs.CV] (2016)

  4. Couprie, C., Farabet, C., Najman, L., LeCun, Y.: Indoor semantic segmentation using depth information. arXiv:1301.3572 [cs.CV] (2013)

  5. Gupta, S., Girshick, R., Arbeláez, P., Malik, J.: Learning rich features from RGB-D images for object detection and segmentation. In: Proceedings European Conference on Computer Vision (ECCV) (2014)

    Google Scholar 

  6. Valada, A., Oliveira, G.L., Brox, T., Burgard, W.: Deep Multispectral Semantic Scene Understanding of Forested Environments Using Multimodal Fusion. In: Proceedings International Symposium on Experimental Robotics (ISER) (2016)

    Google Scholar 

  7. Hariharan, B., Arbeláez, P., Girshick, R., Malik, J.: Hypercolumns for object segmentation and fine-grained localization. In: Proceedings IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2015)

    Google Scholar 

  8. Farabet, C., Couprie, C., Najman, L., LeCun, Y.: Learning hierarchical features for scene labeling. IEEE Trans. Pattern Anal. Mach. Intell. (TPAMI) 35(8), 1915–1929 (2013)

    Article  Google Scholar 

  9. Ladický, L., Sturgess, P., Alahari, K., Russell, C., Torr, P.H.S.: What, where and how many? combining object detectors and CRFs. In: Proceedings European Conference on Computer Vision (ECCV) (2010)

    Google Scholar 

  10. Micusik, B., Košecká, J., Singh, G.: Semantic parsing of street scenes from video. Intl J. Rob. Res. (IJRR) 31(4), 484–497 (2012)

    Article  Google Scholar 

  11. Xiao, J., Quan, L.: Multiple view semantic segmentation for street view images. In: Proceedings IEEE Intl Conference on Computer Vision (ICCV) (2009)

    Google Scholar 

  12. Simonyan, K., Zisserman A.: Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556 [cs.CV] (2014)

  13. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. arXiv:1512.03385 [cs.CV] (2015)

  14. Munoz, D., Bagnell, J.A., Hebert, M.: Co-inference for multi-modal scene analysis. In: Proceedings European Conference on Computer Vision (ECCV) (2012)

    Google Scholar 

  15. Newman, P., et al.: Navigating, recognizing and describing urban spaces with vision and lasers. Intl J. Rob. Res. (IJRR) 28(11–12), 1406–1433 (2009)

    Article  Google Scholar 

  16. Cadena, C., Košecká, J.: Semantic segmentation with heterogeneous sensor coverages. In: Proceedings IEEE Intl Conference on Robotics and Automation (ICRA) (2014)

    Google Scholar 

  17. Alvis, C.D., Ott, L., Ramos, F.: Urban scene segmentation with laser-constrained CRFs. In: Proceedings IEEE/RSJ Intl Conference on Intelligent Robots and Systems (IROS) (2016)

    Google Scholar 

  18. Gupta, S., Arbeláez, P., Malik, J.: Perceptual organization and recognition of indoor scenes from RGB-D images. In: Proceedings IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2013)

    Google Scholar 

  19. Maturana, D., Scherer, S.: 3D convolutional neural networks for landing zone detection from LiDAR. In: Proceedings IEEE Intl Conference on Robotics and Automation (ICRA) (2015)

    Google Scholar 

  20. Scherer, S., Chamberlain, L.J., Singh, S.: Online assessment of landing sites. In: Proceedings AIAA Infotech@Aerospace (2010)

    Google Scholar 

  21. Amanatides, J., Woo, A.: A fast voxel traversal algorithm for ray tracing. In: Proceedings Eurographics (1987)

    Google Scholar 

Download references

Acknowledgements

We thank the Yamaha Motor corporation for supporting this research.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Dong-Ki Kim .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Kim, DK., Maturana, D., Uenoyama, M., Scherer, S. (2018). Season-Invariant Semantic Segmentation with a Deep Multimodal Network. In: Hutter, M., Siegwart, R. (eds) Field and Service Robotics. Springer Proceedings in Advanced Robotics, vol 5. Springer, Cham. https://doi.org/10.1007/978-3-319-67361-5_17

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-67361-5_17

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-67360-8

  • Online ISBN: 978-3-319-67361-5

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics