Large-scale 3D Semantic Mapping Using Stereo Vision

Yang, Yi; Qiu, Fan; Li, Hao; Zhang, Lu; Wang, Mei-Ling; Fu, Meng-Yin

doi:10.1007/s11633-018-1118-y

Large-scale 3D Semantic Mapping Using Stereo Vision

Research Article
Published: 09 March 2018

Volume 15, pages 194–206, (2018)
Cite this article

International Journal of Automation and Computing Aims and scope Submit manuscript

Yi Yang ORCID: orcid.org/0000-0003-3964-2433¹,
Fan Qiu¹,
Hao Li¹,
Lu Zhang¹,
Mei-Ling Wang¹ &
…
Meng-Yin Fu^1,2

723 Accesses
14 Citations
4 Altmetric
Explore all metrics

Abstract

In recent years, there have been a lot of interests in incorporating semantics into simultaneous localization and mapping (SLAM) systems. This paper presents an approach to generate an outdoor large-scale 3D dense semantic map based on binocular stereo vision. The inputs to system are stereo color images from a moving vehicle. First, dense 3D space around the vehicle is constructed, and the motion of camera is estimated by visual odometry. Meanwhile, semantic segmentation is performed through the deep learning technology online, and the semantic labels are also used to verify the feature matching in visual odometry. These three processes calculate the motion, depth and semantic label of every pixel in the input views. Then, a voxel conditional random field (CRF) inference is introduced to fuse semantic labels to voxel. After that, we present a method to remove the moving objects by incorporating the semantic labels, which improves the motion segmentation accuracy. The last is to generate the dense 3D semantic map of an urban environment from arbitrary long image sequence. We evaluate our approach on KITTI vision benchmark, and the results show that the proposed method is effective.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation

3D Object Detection for Autonomous Driving: A Comprehensive Survey

Article 27 April 2023

BEVFormer: Learning Bird’s-Eye-View Representation from Multi-camera Images via Spatiotemporal Transformers

References

G. Ros, S. Ramos, M. Granados, A. Bakhtiary, D. Vazquez, A. M. Lopez. Vision-based offline-online perception paradigm for autonomous driving. In Proceedings of IEEE Winter Conference on Applications of Computer Vision, IEEE, Waikoloa, USA, pp. 231–238, 2015. DOI: 10.1109/WACV.2015.38.
Google Scholar
J. Mason, B. Marthi. An object-based semantic world model for long-term change detection and semantic querying. In Proceedings of IEEE/RSJ International Conference on Intelligent Robots and Systems, IEEE, Vilamoura, Portugal, pp. 3851–3858, 2012. DOI: 10.1109/IROS.2012.6385729.
Google Scholar
A. Nüchter, J. Hertzberg. Towards semantic maps for mobile robots. Robotics and Autonomous Systems, vol. 56, no. 11, pp. 915–926, 2008. DOI: 10.1016/j.robot.2008.08.001.
Article Google Scholar
V. Badrinarayanan, A. Kendall, R. Cipolla. Segnet: A deep convolutional encoder-decoder architecture for image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 39, no. 12, pp. 2481–2495, 2017. DOI: 10.1109/TPAMI.2016.2644615.
Article Google Scholar
A. Geiger, P. Lenz, R. Urtasun. Are we ready for autonomous driving? The KITTI vision benchmark suite. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, IEEE, Providence, USA, pp. 3354–3361, 2012. DOI: 10.1109/CVPR.2012.6248074.
Google Scholar
S. Agarwal, Y. Furukawa, N. Snavely, I. Simon, B. Curless, M. Seitz Steven, R. Szeliski. Building Rome in a day. Communications of the ACM, vol. 54, no. 10, pp. 105–112, 2011. DOI: 10.1145/2001269.2001293.
Article Google Scholar
D. Munoz, J. A. Bagnell, N. Vandapel, M. Hebert. Contextual classification with functional max-margin Markov networks. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, IEEE, Miami, USA, pp. 975–982, 2009. DOI: 10.1109/CVPR.2009.5206590.
Google Scholar
B. Douillard, D. Fox, F. Ramos, H. Durrant-Whyte. Classification and semantic mapping of urban environments. The International Journal of Robotics Research, vol. 30, no. 1, pp. 5–32, 2011. DOI: 10.1177/0278364910373409.
Article Google Scholar
R. Zhang, S. A. Candra, K. Vetter, A. Zakhor. Sensor fusion for semantic segmentation of urban scenes. In Proceedings of IEEE International Conference on Robotics and Automation, IEEE, Seattle, USA, pp. 1850–1857, 2015. DOI: 10.1109/ICRA.2015.7139439.
Google Scholar
F. Endres, J. Hess, J. Sturm, D. Cremers, W. Burgard. 3-D mapping with an RGB-D camera. IEEE Transactions on Robotics, vol. 30, no. 1, pp. 177–187, 2014. DOI: 10.1109/TRO.2013.2279412.
Article Google Scholar
M. Gunther, T. Wiemann, S. Albrecht, J. Hertzberg. Building semantic object maps from sparse and noisy 3d data. In Proceedings of IEEE/RSJ International Conference on Intelligent Robots and Systems, IEEE, Tokyo, Japan, pp. 2228–2233, 2013. DOI: 10.1109/IROS.2013.6696668.
Google Scholar
S. Sengupta, E. Greveson, A. Shahrokni, P. H. S. Torr. Urban 3D semantic modelling using stereo vision. In Proceedings of IEEE International Conference on Robotics and Automation, IEEE, Karlsruhe, Germany, pp. 580–585, 2013. DOI: 10.1109/ICRA.2013.6630632.
Google Scholar
N. D. Reddy, P. Singhal, V. Chari, K. M. Krishna. Dynamic body VSLAM with semantic constraints. In Proceedings of IEEE/RSJ International Conference on Intelligent Robots and Systems, IEEE, Hamburg, Germany, pp. 1897–1904, 2015. DOI: 10.1109/IROS.2015.7353626.
Google Scholar
J. P. C. Valentin, S. Sengupta, J. Warrell, A. Shahrokni, P. H. S. Torr. Mesh based semantic modelling for indoor and outdoor scenes. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, IEEE, Portland, USA, pp. 2067–2074, 2013. DOI: 10.1109/CVPR.2013.269.
Google Scholar
J. Civera, D. Gálvez-López, L. Riazuelo, J. D. Tardós, J. M. M. Montiel. Towards semantic SLAM using a monocular camera. In Proceedings of IEEE/RSJ International Conference on Intelligent Robots and Systems, IEEE, San Francisco, USA, pp. 1277–1284, 2011. DOI: 10.1109/IROS.2011.6094648.
Google Scholar
V. Vineet, O. Miksik, M. Lidegaard, M. Niessner, S. Golodetz, V. A. Prisacariu, O. Kähler, D. W. Murray, S. Izadi, P. Pérez, P. H. S. Torr. Incremental dense semantic stereo fusion for large-scale semantic scene reconstruction. In Proceedings of IEEE International Conference on Robotics and Automation, IEEE, Seattle, USA, pp. 75–82, 2015. DOI: 10.1109/ICRA.2015.7138983.
Google Scholar
D. Scharstein, R. Szeliski. A taxonomy and evaluation of dense two-frame stereo correspondence algorithms. International Journal of Computer Vision, vol. 47, no. 1–3, pp. 7–42, 2002. DOI: 10.1023/A:1014573219977.
Article MATH Google Scholar
H. Hirschmuller. Stereo processing by semiglobal matching and mutual information. IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 30, no. 2, pp. 328–341, 2008. DOI: 10.1109/TPAMI.2007.1166.
Article Google Scholar
A. Geiger, M. Roser, R. Urtasun. Efficient large-scale stereo matching. In Proceedings of the 10th Asian Conference on Computer Vision, Springer, Queenstown, New Zealand, pp. 25–38, 2010. DOI: 10.1007/978-3-642-19315-6 3.
Google Scholar
J. Žbontar, Y. LeCun. Computing the stereo matching cost with a convolutional neural network. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, IEEE, Boston, USA, pp. 1592–1599, 2015. DOI: 10.1109/CVPR.2015.7298767.
Google Scholar
P. Krähenbühl, V. Koltun. Efficient inference in fully connected CRFs with Gaussian edge potentials. In Proceedings of Advances in Neural Information Processing Systems, Granada, Spain, pp. 109–117, 2011.
Google Scholar
F. Qiu, Y. Yang, H. Li, M. Y. Fu, S. T. Wang. Semantic motion segmentation for urban dynamic scene understanding. In Proceedings of IEEE International Conference on Automation Science and Engineering, IEEE, Fort Worth, USA, pp. 497–502, 2016. DOI: 10.1109/COASE.2016.7743446.
Google Scholar
Z. Hu, K. Uchimura. U-V-disparity: An efficient algorithm for stereovision based scene analysis. In Proceedings of IEEE Intelligent Vehicles Symposium, IEEE, Las Vegas, USA, pp. 48–54, 2005. DOI: 10.1109/IVS.2005.1505076.
Google Scholar
Y. Li, Y. Ruichek. Occupancy grid mapping in urban environments from a moving on-board stereo-vision system. Sensors, vol. 14, no. 6, pp. 10454–10478, 2014.
Article Google Scholar
A. Geiger, J. Ziegler, C. Stiller. StereoScan: Dense 3D reconstruction in real-time. In Proceedings of IEEE Intelligent Vehicles Symposium, IEEE, Baden-Baden Germany, pp. 963–968, 2011. DOI: 10.1109/IVS.2011.5940405.
Google Scholar
Niessner M, Zollhöfer M, S. Izadi, M. Stamminger. Realtime 3D reconstruction at scale using voxel hashing. ACM Transactions on Graphics, vol. 32, no. 6, Article number 169, 2013. DOI: 10.1145/2508363.2508374.
Article Google Scholar
R. Mur-Artal, J. M. M. Montiel, J. D. Tardós. ORBSLAM: A versatile and accurate monocular SLAM system. IEEE Transactions on Robotics, vol. 31, no. 5, pp. 1147–1163, 2015. DOI: 10.1109/TRO.2015.2463671.
Article Google Scholar
M. Menze, A. Geiger. Object scene flow for autonomous vehicles. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, IEEE, Boston, USA, pp. 3061–3070, 2015. DOI: 10.1109/CVPR.2015.7298925.
Google Scholar
L. Ladický, C. Russell, P. Kohli, P. H. S. Torr. Associative hierarchical random fields. IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 36, no. 6, pp. 1056–1077, 2014. DOI: 10.1109/TPAMI.2013.165.
Article Google Scholar
S. Sengupta, P. Sturgess, L. Ladický, P. H. S. Torr. Automatic dense visual semantic mapping from streetlevel imagery. In Proceedings of IEEE/RSJ International Conference on Intelligent Robots and Systems, IEEE, Vilamoura, Portugal, pp. 857–862, 2012. DOI: 10.1109/IROS.2012.6385958.
Google Scholar
H. He, B. Upcroft. Nonparametric semantic segmentation for 3D street scenes. In Proceedings of IEEE/RSJ International Conference on Intelligent Robots and Systems, IEEE, Tokyo, Japan, pp. 3697–3703, 2013. DOI: 10.1109/IROS.2013.6696884.
Google Scholar
A. Kundu, K. M. Krishna, J. Sivaswamy. Moving object detection by multi-view geometric techniques from a single camera mounted robot. In Proceedings of IEEE/RSJ International Conference on Intelligent Robots and Systems, IEEE, St. Louis, USA, pp. 4306–4312, 2009. DOI: 10.1109/IROS.2009.5354227.
Google Scholar
T. H. Lin, C. C. Wang. Deep learning of spatio-temporal features with geometric-based moving point detection for motion segmentation. In Proceedings of IEEE International Conference on Robotics and Automation, IEEE, Hong Kong, China, pp. 3058–3065, 2014. DOI: 10.1109/ICRA. 2014.6907299.
Google Scholar
N. D. Reddy, P. Singhal, K. M. Krishna. Semantic motion segmentation using dense CRF formulation. In Proceedings of Indian Conference on Computer Vision Graphics and Image Processing, ACM, Bangalore, India, Article number 56, 2014. DOI: 10.1145/2683483.2683539.
Google Scholar

Download references

Author information

Authors and Affiliations

School of Automation and National Key Laboratory of Intelligent Control and Decision of Complex Systems, Beijing Institute of Technology, Beijing, 100081, China
Yi Yang, Fan Qiu, Hao Li, Lu Zhang, Mei-Ling Wang & Meng-Yin Fu
Nanjing University of Science and Technology, Nanjing, 210094, China
Meng-Yin Fu

Authors

Yi Yang
View author publications
You can also search for this author in PubMed Google Scholar
Fan Qiu
View author publications
You can also search for this author in PubMed Google Scholar
Hao Li
View author publications
You can also search for this author in PubMed Google Scholar
Lu Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Mei-Ling Wang
View author publications
You can also search for this author in PubMed Google Scholar
Meng-Yin Fu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yi Yang.

Additional information

This work was supported by National Natural Science Foundation of China (Nos. NSFC 61473042 and 61105092) and Beijing Higher Education Young Elite Teacher Project (No.YETP1215).

Recommended by Associate Editor Hong Qiao

Yi Yang received the Ph.D. degree in automation from Beijing Institute of Technology, China in 2010. He is currently an associate professor with the School of Automation, Beijing Institute of Technology, China.

His research interests include autonomous vehicles, bioinspired robots, intelligent navigation, semantic mapping and scene understanding.

Fan Qiu received the B.Eng. degree in automation from the Beijing Institute of Technology, China in 2014, where he is currently a master student in control science and engineering.

His research interests include deep learning, semantic mapping and computer vision.

Hao Li received the B.Eng. degree in automation from the Beijing Institute of Technology, China in 2015, where he is currently a master student in control science and engineering.

His research interests include machine learning, semantic mapping and scene understanding.

Lu Zhang received the B. Eng. degree in automation from the Beijing Institute of Technology, China in 2015, where he is a master student in control science and engineering.

His research interests include SLAM, path planning and computer vision.

Mei-Ling Wang received the M.Eng. and Ph.D. degrees from School of Automation, Beijing Institute of Technology, China in 1995 and 2007, respectively. She is currently a professor with School of Automation, Beijing Institute of Technology, and a Changjiang Scholar of the Ministry of Education of China.

Her research interests include geographic information system, intelligent navigation and unmanned ground vehicles.

Meng-Yin Fu received the M.Eng. degree from School of Automation, Beijing Institute of Technology, China in 1992, and the Ph.D. degree from the Chinese Academy of Sciences, China in 2000. He was a professor with School of Automation, Beijing Institute of Technology, China, from 2000 to 2013. He is currently a professor with the Nanjing University of Science and Technology, China, and a Changjiang Scholar of the Ministry of Education of China.

His interests include integrated navigation system, intelligent navigation and unmanned ground vehicles.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Yang, Y., Qiu, F., Li, H. et al. Large-scale 3D Semantic Mapping Using Stereo Vision. Int. J. Autom. Comput. 15, 194–206 (2018). https://doi.org/10.1007/s11633-018-1118-y

Download citation

Received: 20 September 2017
Accepted: 02 February 2018
Published: 09 March 2018
Issue Date: April 2018
DOI: https://doi.org/10.1007/s11633-018-1118-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Large-scale 3D Semantic Mapping Using Stereo Vision

Abstract

Access this article

Similar content being viewed by others

Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation

3D Object Detection for Autonomous Driving: A Comprehensive Survey

BEVFormer: Learning Bird’s-Eye-View Representation from Multi-camera Images via Spatiotemporal Transformers

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Large-scale 3D Semantic Mapping Using Stereo Vision

Abstract

Access this article

Similar content being viewed by others

Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation

3D Object Detection for Autonomous Driving: A Comprehensive Survey

BEVFormer: Learning Bird’s-Eye-View Representation from Multi-camera Images via Spatiotemporal Transformers

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation