Multi-class indoor semantic segmentation with deep structured model

Zheng, Chuanxia; Wang, Jianhua; Chen, Weihai; Wu, Xingming

doi:10.1007/s00371-017-1411-8

Multi-class indoor semantic segmentation with deep structured model

Original Article
Published: 08 June 2017

Volume 34, pages 735–747, (2018)
Cite this article

The Visual Computer Aims and scope Submit manuscript

Chuanxia Zheng¹,
Jianhua Wang¹,
Weihai Chen¹ &
…
Xingming Wu¹

831 Accesses
10 Citations
Explore all metrics

Abstract

Indoor semantic segmentation plays a critical role in many applications, such as intelligent robots. However, multi-class recognition is still challenging, especially for pixel-level indoor semantic labeling. In this paper, a novel deep structured model that combines the strengths of the widely used convolutional neural networks (CNNs) and recurrent neural networks (RNNs) is proposed. We first present a multi-information fusion model that utilizes the scene category information to fine-tune the fully convolutional network. Then, to refine the coarse outputs of CNN, the RNN is applied to the final CNN layer so that we can build an end-to-end trainable system. This Graph-RNN is transformed from a conditional random field based on superpixel segmentation graphical modeling that can utilize flexible contextual information of different neighboring regions. The experimental results on the recent large SUN RGB-D dataset demonstrate that the proposed model outperforms existing state-of-the-art methods on the challenging 40 dominant classes task (\(40.8\%\) mean IU accuracy and \(69.1\%\) pixel accuracy). We also evaluate our model on the public NYU depth V2 dataset and achieve remarkable performance.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Indoor scene understanding via RGB-D image segmentation employing depth-based CNN and CRFs

Article 05 July 2019

Wei Li, Junhua Gu, … Jungong Han

Indoor scene segmentation algorithm based on full convolutional neural network

Article 02 May 2020

Zijiang Zhu, Deming Li, … Jianjun Li

FuseNet: Incorporating Depth into Semantic Segmentation via Fusion-Based CNN Architecture

Notes

References

Anand, A., Koppula, H.S., Joachims, T., Saxena, A.: Contextually guided semantic labeling and search for three-dimensional point clouds. Int. J. Robot. Res. 32(1):19–34 (2013)
Arbelaez, P., Maire, M., Fowlkes, C., Malik, J.: Contour detection and hierarchical image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 33(5), 898–916 (2011)
Article Google Scholar
Bell, S., Upchurch, P., Snavely, N., Bala, K.: Material recognition in the wild with the materials in context database. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 3479–3487 (2015)
Bo, L., Ren, X., Fox, D.: Unsupervised feature learning for RGB-D based object recognition. In: Experimental Robotics, pp. 387–402. Springer (2013)
Chen, L.C., Papandreou, G., Kokkinos, I., Murphy, K., Yuille, A.L.: Semanticimage segmentation with deep convolutional nets and fully connected CRFS. In: International Conference on Learning Representations, pp. 357–361. ICLR, Hilton San Diego Resort (2015)
Chen, W., Yue, H., Wang, J., Wu, X.: An improved edge detection algorithm for depth map inpainting. Opt. Lasers Eng. 55, 69–77 (2014)
Article Google Scholar
Cheng, M.M., Zheng, S., Lin, W.Y., Vineet, V., Sturgess, P., Crook, N., Mitra, N.J., Torr, P.: Imagespirit: verbal guided image parsing. ACM Trans. Graph. 3(1), 3:1–3:11 (2014). doi:10.1145/2682628
Google Scholar
Couprie, C., Farabet, C., Najman, L., LeCun, Y.: Indoor semantic segmentation using depth information. arXiv preprint arXiv:1301.3572 (2013)
Deng, Z., Todorovic, S., Jan Latecki, L.: Semantic segmentation of RGBD images with mutex constraints. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1733–1741 (2015)
Ding, K., Chen, W., Wu, X.: Optimum inpainting for depth map based on l 0 total variation. Vis. Comput. 30(12), 1311–1320 (2014)
Article Google Scholar
Eigen, D., Fergus, R.: Predicting depth, surface normals and semantic labels with a common multi-scale convolutional architecture. In: IEEE International Conference on Computer Vision, pp. 2650–2658 (2015)
Farabet, C., Couprie, C., Najman, L., LeCun, Y.: Learning hierarchical features for scene labeling. IEEE Trans. Pattern Anal. Mach. Intell. 35(8), 1915–1929 (2013)
Article Google Scholar
Girshick, R.: Fast R-CNN. In: The IEEE International Conference on Computer Vision (ICCV). ICCV, Santiago, Chile (2015)
Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: 2014 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 580–587. IEEE (2014)
Gupta, S., Arbelaez, P., Malik, J.: Perceptual organization and recognition of indoor scenes from RGB-D images. In: 2013 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 564–571. IEEE (2013)
Gupta, S., Girshick, R., Arbelaez, P., Malik, J.: Learning rich features from RGB-D images for object detection and segmentation. In: Computer Vision–ECCV 2014, pp. 345–360. Springer (2014)
Hariharan, B., Arbelaez, P., Girshick, R., Malik, J.: Simultaneous detection and segmentation. In: European Conference on Computer Vision (ECCV), pp. 297–312(2014)
Hayat, M., Khan, S.H., Bennamoun, M.: A spatial layout and scale invariant feature representation for indoor scene classification. IEEE Trans. Image Process. 25(10), 4829–4841 (2016)
Hermans, A., Floros, G., Leibe, B.: Dense 3D semantic mapping of indoor scenes from RGB-D images. In: 2014 IEEE International Conference on Robotics and Automation (ICRA), pp. 2631–2638. IEEE (2014)
Hoiem, D., Efros, A.A., Hebert, M.: Putting objects in perspective. Int. J. Comput. Vis. 80(1), 3–15 (2008)
Article Google Scholar
Huang, G., Liu, Z., Weinberger, K.Q.: Densely connected convolutional networks. arXiv:1608.06993 (2016)
Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., Guadarrama, S., Darrell, T.: Caffe: Convolutional architecture for fast feature embedding. In: Proceedings of the 22nd ACM international conference on Multimedia, pp. 675–678. ACM (2014)
Khan, S.H., Bennamoun, M., Sohel, F., Togneri, R., Naseem, I.: Integrating geometrical context for semantic labeling of indoor scenes using rgbd images. Int. J. Comput. Vis. 117(1), 1–20 (2016)
Article MathSciNet Google Scholar
Krähenbühl, P., Koltun, V.: Efficient inference in fully connected CRFS with Gaussian edge potentials. Adv. Neural Inf. Process. Syst. 109–117 (2011)
Koppula, H.S., Anand, A., Joachims, T., Saxena, A.: Semantic labeling of 3D point clouds for indoor scenes. In: Advances in Neural Information Processing Systems (NIPS), pp. 244–252 (2011)
Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems (NIPS), pp. 1097–1105 (2012)
Lai, K., Bo, L., Fox, D.: Unsupervised feature learning for 3D scene labeling. In: 2014 IEEE International Conference on Robotics and Automation (ICRA), pp. 3050–3057. IEEE (2014)
Lecun, Y., Bengio, Y., Hinton, G.: Deep learning. Nature 521(7553), 436–444 (2015)
Article Google Scholar
LeCun, Y., Boser, B., Denker, J.S., Henderson, D., Howard, R.E., Hubbard, W., Jackel, L.D.: Backpropagation applied to handwritten zip code recognition. Neural Comput. 1(4), 541–551 (1989)
Article Google Scholar
Li, Z., Gan, Y., Liang, X., Yu, Y., Cheng, H., Lin, L.: LSTM-CF: Unifying context modeling and fusion with LSTMS for RGB-D scene labeling. In: European Conference on Computer Vision, pp. 541–557. Springer (2016)
Liang, X., Shen, X., Xiang, D., Feng, J., Lin, L., Yan, S.: Semantic object parsing with local-global long short-term memory. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2016)
Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 1337–1342. CVPR, Boston, MA, USA (2015)
Nathan Silberman Derek Hoiem, P.K., Fergus, R.: Indoor segmentation and support inference from RGBD images. In: ECCV (2012)
Ren, X., Bo, L., Fox, D.: RGB-(B) scene labeling: Features and algorithms. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2759–2766. IEEE (2012)
Shotton, J., Winn, J., Rother, C., Criminisi, A.: Textonboost for image understanding: multi-class object recognition and segmentation by jointly modeling texture, layout, and context. Int. J. Comput. Vis. 81(1), 2–23 (2009)
Article Google Scholar
Shuai, B., Zuo, Z., Wang, B., Wang, G.: Dag-recurrent neural networks for scene labeling. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2016)
Silberman, N., Fergus, R.: Indoor scene segmentation using a structured light sensor. In: 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops), pp. 601–608. IEEE (2011)
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. Comput. Sci. arXiv:1409.1556 (2014)
Song, S., Lichtenberg, S.P., Xiao, J.: Sun RGB-D: A RGB-D scene understanding benchmark suite. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 567–576 (2015)
Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Going deeper with convolutions. arXiv preprint arXiv:1409.4842 (2014)
Tighe, J., Lazebnik, S.: Superparsing: scalable nonparametric image parsing with superpixels. In: Computer Vision–ECCV 2010, pp. 352–365. Springer (2010)
van den Oord, A., Kalchbrenner, N., Kavukcuoglu, K.: Pixel recurrent neural networks. arXiv preprint arXiv:1601.06759 (2016)
Wang, J., Zheng, C., Chen, W., Wu, X.: Learning aggregated features and optimizing model for semantic labeling. Vis. Comput. 1–14 (2016)
Zheng, S., Jayasumana, S., Romera-Paredes, B., Vineet, V., Su, Z., Du, D., Huang, C., Torr, P.H.: Conditional random fields as recurrent neuralnetworks. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1529–1537 (2015)
Zhou, B., Garcia, A.L., Xiao, J., Torralba, A., Oliva, A.: Learning deep features for scene recognition using places database. Adv. Neural Inf. Process. Syst. 1, 487–495 (2014)
Google Scholar

Download references

Acknowledgements

The work described in this paper was supported by National Science Foundation of China under the research Project Grant Nos. 61573048, 61620106012, the International Scientific and Technological Cooperation Projects of China under Grant No. 2015DFG12650, and the Key Laboratory of Robotics and Intelligent Manufacturing Equipment Technology of Zhejiang Province.

Author information

Authors and Affiliations

School of Automation Science and Electrical Engineering, Beihang University, Beijing, 100191, People’s Republic of China
Chuanxia Zheng, Jianhua Wang, Weihai Chen & Xingming Wu

Authors

Chuanxia Zheng
View author publications
You can also search for this author in PubMed Google Scholar
Jianhua Wang
View author publications
You can also search for this author in PubMed Google Scholar
Weihai Chen
View author publications
You can also search for this author in PubMed Google Scholar
Xingming Wu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Weihai Chen.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zheng, C., Wang, J., Chen, W. et al. Multi-class indoor semantic segmentation with deep structured model. Vis Comput 34, 735–747 (2018). https://doi.org/10.1007/s00371-017-1411-8

Download citation

Published: 08 June 2017
Issue Date: May 2018
DOI: https://doi.org/10.1007/s00371-017-1411-8

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Multi-class indoor semantic segmentation with deep structured model

Abstract

Access this article

Similar content being viewed by others

Indoor scene understanding via RGB-D image segmentation employing depth-based CNN and CRFs

Indoor scene segmentation algorithm based on full convolutional neural network

FuseNet: Incorporating Depth into Semantic Segmentation via Fusion-Based CNN Architecture

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Multi-class indoor semantic segmentation with deep structured model

Abstract

Access this article

Similar content being viewed by others

Indoor scene understanding via RGB-D image segmentation employing depth-based CNN and CRFs

Indoor scene segmentation algorithm based on full convolutional neural network

FuseNet: Incorporating Depth into Semantic Segmentation via Fusion-Based CNN Architecture

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation