Advertisement

Cognitive Computation

, Volume 10, Issue 6, pp 875–889 | Cite as

Ongoing Evolution of Visual SLAM from Geometry to Deep Learning: Challenges and Opportunities

  • Ruihao LiEmail author
  • Sen Wang
  • Dongbing Gu
Article
  • 484 Downloads

Abstract

Visual simultaneous localization and mapping (SLAM) has been investigated in the robotics community for decades. Significant progress and achievements on visual SLAM have been made, with geometric model-based techniques becoming increasingly mature and accurate. However, they tend to be fragile under challenging environments. Recently, there is a trend to develop data-driven approaches, e.g., deep learning, for visual SLAM problems with more robust performance. This paper aims to witness the ongoing evolution of visual SLAM techniques from geometric model-based to data-driven approaches by providing a comprehensive technical review. Our contribution is not only just a compilation of state-of-the-art end-to-end deep learning SLAM work, but also an insight into the underlying mechanism of deep learning SLAM. For such a purpose, we provide a concise overview of geometric model-based approaches first. Next, we identify visual depth estimation using deep learning is a starting point of the evolution. It is from depth estimation that ego-motion or pose estimation techniques using deep learning flourish rapidly. In addition, we strive to link semantic segmentation using deep learning with emergent semantic SLAM techniques to shed light on simultaneous estimation of ego-motion and high-level understanding. Finally, we visualize some further opportunities in this research direction.

Keywords

SLAM Deep learning Depth estimation Pose estimation Semantic mapping 

Notes

Acknowledgments

The authors are grateful to the reviewers for their valuable comments that considerably contributed to improving this paper.

Funding Information

The first author has been financially supported by scholarship from China Scholarship Council.

Compliance with Ethical Standards

Conflict of interests

The authors declare that they have no conflict of interest.

Human and Animal Rights

This article does not contain any studies with human or animal subjects performed by the any of the authors.

References

  1. 1.
    Klein G, Murray D. Parallel tracking and mapping for small AR workspaces. IEEE/ACM International Symposium on Mixed and Augmented Reality. IEEE; 2007. p. 225–234.Google Scholar
  2. 2.
    Endres F, Hess J, Sturm J, Cremers D, Burgard W. 3-D Mapping with an RGB-d camera. IEEE Trans Robot 2014;30(1):177–187.CrossRefGoogle Scholar
  3. 3.
    Mur-Artal R, Montiel J, Tardos JD. ORB-SLAM: a versatile and accurate monocular SLAM system. IEEE Trans Robot 2015;31(5):1147–1163.CrossRefGoogle Scholar
  4. 4.
    Newcombe RA, Lovegrove SJ, Davison AJ. DTAM: dense tracking and mapping in real-time. IEEE International Conference on Computer Vision (ICCV). IEEE; 2011. p. 2320–2327.Google Scholar
  5. 5.
    Forster C, Pizzoli M, Scaramuzza D. SVO: fast semi-direct monocular visual odometry. IEEE International Conference on Robotics and Automation (ICRA). IEEE; 2014. p. 15–22.Google Scholar
  6. 6.
    Engel J, Koltun V, Cremers D. Direct sparse odometry. IEEE Trans Patt Anal Mach Intell 2018;40 (3):611–25.CrossRefGoogle Scholar
  7. 7.
    Cadena C, Carlone L, Carrillo H, Latif Y, Scaramuzza D, Neira J, Reid I, Leonard JJ. Past, present, and future of simultaneous localization and mapping: toward the robust-perception age. IEEE Trans Robot 2016;32(6):1309–1332.CrossRefGoogle Scholar
  8. 8.
    McCormac J., Handa A., Davison A., Leutenegger S. SemanticFusion: dense 3D semantic mapping with convolutional neural networks. IEEE International Conference on Robotics and Automation (ICRA). IEEE; 2017. p. 4628–4635.Google Scholar
  9. 9.
    Goodfellow I, Bengio Y, Courville A. Deep learning. Cambridge: MIT Press; 2016.Google Scholar
  10. 10.
    LeCun Y, Boser B, Denker JS, Henderson D, Howard RE, Hubbard W, Jackel LD. Backpropagation applied to handwritten zip code recognition. Neural Comput 1989;1(4):541–551.CrossRefGoogle Scholar
  11. 11.
    Perera AG, Law YW, Chahl J. 2018. Human pose and path estimation from aerial video using dynamic classifier selection, Cognitive Computation.  https://doi.org/10.1007/s12559-018-9577-6.
  12. 12.
    Cao L, Sun F, Liu X, Huang W, Kotagiri R, Li H. 2018. End-to-end convnet for tactile recognition using residual orthogonal tiling and pyramid convolution ensemble, Cognitive Computation.  https://doi.org/10.1007/s12559-018-9568-7.
  13. 13.
    Zeng D, Zhao F, Shen W, Ge S. Compressing and accelerating neural network for facial point localization. Cogn Comput 2018;10(2):359–367.CrossRefGoogle Scholar
  14. 14.
    Godard C, Mac Aodha O, Brostow GJ. Unsupervised monocular depth estimation with left-right consistency. IEEE Conference on computer vision and pattern recognition (CVPR); 2017.Google Scholar
  15. 15.
    Wang S, Clark R, Wen H, Trigoni N. DeepVO: towards end-to-end visual odometry with deep recurrent convolutional neural networks. IEEE International Conference on Robotics and Automation (ICRA). IEEE; 2017. p. 2043–2050.Google Scholar
  16. 16.
    Durrant-Whyte H, Bailey T. Simultaneous localization and mapping: Part I. IEEE Robot Autom Magazine 2006;13(2):99–110.CrossRefGoogle Scholar
  17. 17.
    Bailey T, Durrant-Whyte H. Simultaneous localization and mapping: part II. IEEE Robot Autom Magazine 2006;13(3):108–117.CrossRefGoogle Scholar
  18. 18.
    Scaramuzza D, Fraundorfer F. Visual odometry: part I - the first 30 years and fundamentals. IEEE Robot Autom Magazine 2011;18(4):80–92.CrossRefGoogle Scholar
  19. 19.
    Fraundorfer F, Scaramuzza D. Visual odometry: part II - matching, robustness, optimization, and applications. IEEE Robot Autom Magazine 2012;19(2):78–90.CrossRefGoogle Scholar
  20. 20.
    Davison AJ, Reid ID, Molton ND, Stasse O. MonoSLAM: real-time single camera SLAM. IEEE Trans Patt Anal Mach Intell 2007;29(6):1052–1067.CrossRefGoogle Scholar
  21. 21.
    Mur-Artal R, Tardós JD. Fast relocalisation and loop closing in keyframe-based SLAM. IEEE International Conference on Robotics and Automation (ICRA). IEEE; 2014. p. 846–853.Google Scholar
  22. 22.
    Rublee E, Rabaud V, Konolige K, Bradski G. ORB: an efficient alternative to SIFT or SURF. IEEE international conference on Computer Vision (ICCV). IEEE; 2011. p. 2564–2571.Google Scholar
  23. 23.
    Mur-Artal R, Tardós JD. ORB-SLAM2: An open-source SLAM system for monocular, stereo, and RGB-d cameras. IEEE Trans Robot 2017;33(5):1255–1262.CrossRefGoogle Scholar
  24. 24.
    Kueng B, Mueggler E, Gallego G., Scaramuzza D. Low-latency visual odometry using event-based feature tracks. IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE; 2016. p. 16–23.Google Scholar
  25. 25.
    Kim H, Leutenegger S, Davison AJ. Real-time 3D reconstruction and 6-DoF tracking with an event camera. European Conference on Computer Vision. Springer; 2016. p. 349– 364.Google Scholar
  26. 26.
    Salas-Moreno RF, Glocken B, Kelly PH, Davison AJ. Dense planar SLAM. IEEE International Symposium on Mixed and Augmented Reality (ISMAR). IEEE; 2014. p. 157–164.Google Scholar
  27. 27.
    Salas-Moreno RF, Newcombe RA, Strasdat H, Kelly PH, Davison AJ. Slam++: simultaneous localisation and mapping at the level of objects. IEEE Conference on Computer Vision and Pattern Recognition; 2013. p. 1352–1359.Google Scholar
  28. 28.
    Newcombe RA, Izadi S, Hilliges O, Molyneaux D, Kim D, Davison AJ, Kohi P, Shotton J, Hodges S, Fitzgibbon A. KinectFusion: real-time dense surface mapping and tracking. IEEE international symposium on Mixed and Augmented Reality (ISMAR). IEEE; 2011. p. 127–136.Google Scholar
  29. 29.
    Whelan T, Salas-Moreno RF, Glocker B, Davison AJ, Leutenegger S. ElasticFusion: real-time dense SLAM and light source estimation. Int J Robot Res 2016;35(14):1697–1716.CrossRefGoogle Scholar
  30. 30.
    Engel J, Sturm J, Cremers D. Semi-dense visual odometry for a monocular camera. IEEE International Conference on Computer Vision; 2013. p. 1449–1456.Google Scholar
  31. 31.
    Engel J, Schöps T, Cremers D. LSD-SLAM: large-scale direct monocular SLAM. European Conference on Computer Vision (ECCV). Springer; 2014. p. 834–849.Google Scholar
  32. 32.
    Pascoe G, Maddern W, Tanner M, Piniés P, Newman P. NID-SLAM: robust monocular SLAM using normalised information distance. IEEE Conference on Computer Vision and Pattern Recognition (CVPR); 2017.Google Scholar
  33. 33.
    Kendall A, Grimes M, Cipolla R. PoseNet: a convolutional network for real-time 6-DoF camera relocalization. Proceedings of the IEEE International Conference on Computer Vision (ICCV); 2015. p. 2938–2946.Google Scholar
  34. 34.
    Long J, Shelhamer E, Darrell T. Fully convolutional networks for semantic segmentation. IEEE Conference on Computer Vision and Pattern Recognition; 2015. p. 3431–3440.Google Scholar
  35. 35.
    Geiger A, Lenz P, Urtasun R. Are we ready for autonomous driving? The KITTI vision benchmark suite. IEEE Conference on Computer Vision and Pattern Recognition (CVPR); 2012.Google Scholar
  36. 36.
    Janai J, Güney F, Behl A, Geiger A. 2017. Computer vision for autonomous vehicles: problems, datasets and state-of-the-art. arXiv:1704.05519.
  37. 37.
    Cordts M, Omran M, Ramos S, Rehfeld T, Enzweiler M, Benenson R, Franke U, Roth S, Schiele B. The cityscapes dataset for semantic urban scene understanding. IEEE Conference on Computer Vision and Pattern Recognition; 2016. p. 3213–3223.Google Scholar
  38. 38.
    Maddern W, Pascoe G, Linegar C, Newman P. 1 Year, 1000km: the Oxford robotCar dataset. The International Journal of Robotics Research (IJRR) 2017;36(1):3–15.CrossRefGoogle Scholar
  39. 39.
    Burri M, Nikolic J, Gohl P, Schneider T, Rehder J, Omari S, Achtelik MW, Siegwart R. The EuRoC micro aerial vehicle datasets. Int J Robot Res 2016;35(10):1157–1163.CrossRefGoogle Scholar
  40. 40.
    Sturm J, Engelhard N, Endres F, Burgard W, Cremers D. A benchmark for the evaluation of RGB-D SLAM systems. IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE; 2012. p. 573–580.Google Scholar
  41. 41.
    Silberman N, Hoiem D, Kohli P, Fergus R. Indoor segmentation and support inference from RGBD images. European Conference on Computer Vision; 2012. p. 746–760.Google Scholar
  42. 42.
    Everingham M, Van Gool L, Williams CK, Winn J, Zisserman A. The pascal visual object classes (VOC) challenge. Int J Comput Vis 2010;88(2):303–338.CrossRefGoogle Scholar
  43. 43.
    Ros G, Sellart L, Materzynska J, Vazquez D, Lopez AM. The synthia dataset: a large collection of synthetic images for semantic segmentation of urban scenes. IEEE Conference on Computer Vision and Pattern Recognition (CVPR); 2016. p. 3234–3243.Google Scholar
  44. 44.
    Lin T-Y, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Dollár P, Zitnick CL. Microsoft COCO: common objects in context. European Conference on Computer Vision. Springer; 2014. p. 740–755.Google Scholar
  45. 45.
    Zhou B, Zhao H, Puig X, Fidler S, Barriuso A, Torralba A. 2016. Semantic understanding of scenes through the ADE20K dataset. arXiv:1608.05442.
  46. 46.
    Shotton J, Glocker B, Zach C, Izadi S, Criminisi A, Fitzgibbon A. Scene coordinate regression forests for camera relocalization in RGB-D images. IEEE Conference on Computer Vision and Pattern Recognition; 2013. p. 2930–2937.Google Scholar
  47. 47.
    Blanco-Claraco J-L, Moreno-Dueñas F-Á, González-Jiménez J. The Málaga urban dataset: high-rate stereo and liDAR in a realistic urban scenario. Int J Robot Res 2014;33(2):207–214.CrossRefGoogle Scholar
  48. 48.
    Garg R, Carneiro G, Reid I. Unsupervised CNN for single view depth estimation: geometry to the rescue. European Conference on Computer Vision (ECCV). Springer; 2016. p. 740–756.Google Scholar
  49. 49.
    Zhou T, Brown M, Snavely N, Lowe D G. Unsupervised learning of depth and ego-motion from video. IEEE Conference on Computer Vision and Pattern Recognition (CVPR); 2017.Google Scholar
  50. 50.
    Eigen D, Puhrsch C, Fergus R. Depth map prediction from a single image using a multi-scale deep network. Advances in Neural Information Processing Systems; 2014. p. 2366–2374.Google Scholar
  51. 51.
    Liu F, Shen C, Lin G, Reid I. Learning depth from single monocular images using deep convolutional neural fields. IEEE Trans Pattern Anal Mach Intell 2016;38(10):2024–2039.PubMedCrossRefGoogle Scholar
  52. 52.
    Tateno K, Tombari F, Laina I, Navab N. CNN-SLAM: real-time dense monocular SLAM with learned depth prediction. IEEE Conference on Computer Vision and Pattern Recognition (CVPR); 2017.Google Scholar
  53. 53.
    Ladicky L, Shi J, Pollefeys M. Pulling things out of perspective. IEEE Conference on Computer Vision and Pattern Recognition; 2014. p. 89–96.Google Scholar
  54. 54.
    Li B, Shen C, Dai Y, van den Hengel A, He M. Depth and surface normal estimation from monocular images using regression on deep features and hierarchical CRFs. IEEE Conference on Computer Vision and Pattern Recognition; 2015. p. 1119–1127.Google Scholar
  55. 55.
    Ma F, Karaman S. 2017. Sparse-to-dense: depth prediction from sparse depth samples and a single image. arXiv:1709.07492.
  56. 56.
    Ummenhofer B, Zhou H, Uhrig J, Mayer N, Ilg E, Dosovitskiy A, Brox T. Demon: depth and motion network for learning monocular stereo. IEEE Conference on Computer Vision and Pattern Recognition (CVPR); 2017.Google Scholar
  57. 57.
    Xie J, Girshick R, Farhadi A. Deep3d: fully automatic 2D-to-3D video conversion with deep convolutional neural networks. European Conference on Computer Vision (ECCV). Springer; 2016. p. 842–857.Google Scholar
  58. 58.
    Zhong Y, Dai Y, Li H. 2017. Self-supervised learning for stereo matching with self-improving ability. arXiv:1709.00930.
  59. 59.
    Yang Z, Wang P, Xu W, Zhao L, Nevatia R. 2017. Unsupervised learning of geometry with edge-aware depth-normal consistency. arXiv:1711.03665.
  60. 60.
    Vijayanarasimhan S, Ricco S, Schmid C, Sukthankar R, Fragkiadaki K. Sfm-net: learning of structure and motion from video. IEEE Conference on Computer Vision and Pattern Recognition (CVPR); 2017.Google Scholar
  61. 61.
    Clark R, Wang S, Markham A, Trigoni N, Wen H. Vidloc: 6-DoF video-clip relocalization. IEEE Conference on Computer Vision and Pattern Recognition (CVPR); 2017.Google Scholar
  62. 62.
    Li R, Wang S, Long Z, Gu D. 2017. Undeepvo: monocular visual odometry through unsupervised deep learning. arXiv:1709.06841.
  63. 63.
    DeTone D, Malisiewicz T, Rabinovich A. 2017. Toward geometric deep SLAM. arXiv:1707.07410.
  64. 64.
    Kendall A, Cipolla R. Modelling uncertainty in deep learning for camera relocalization. IEEE International Conference on Robotics and Automation (ICRA). IEEE; 2016. p. 4762–4769.Google Scholar
  65. 65.
    Kendall A, Cipolla R. Geometric loss functions for camera pose regression with deep learning. IEEE Conference on Computer Vision and Pattern Recognition (CVPR); 2017.Google Scholar
  66. 66.
    Li R, Liu Q, Gui J, Gu D, Hu H. Indoor relocalization in challenging environments with dual-stream convolutional neural networks. IEEE Trans Autom Sci Eng 2018;15(2):651–62.CrossRefGoogle Scholar
  67. 67.
    Li R, Liu Q, Gui J, Gu D, Hu H. Night-time indoor relocalization using depth image with convolutional neural networks. International Conference on Automation and Computing (ICAC). IEEE; 2016. p. 261–266.Google Scholar
  68. 68.
    Hazirbas FWC, Sattler LL-TT, Hilsenbeck S, Cremers D. Image-based localization using LSTMs for structured feature correlation.Google Scholar
  69. 69.
    Naseer T, Burgard W. Deep regression for monocular camera-based 6-DoF global localization in outdoor environments.Google Scholar
  70. 70.
    DeTone D, Malisiewicz T, Rabinovich A. 2016. Deep image homography estimation. arXiv:1606.03798.
  71. 71.
    Costante G, Mancini M, Valigi P, Ciarfuglia TA. Exploring representation learning with CNNs for frame-to-frame ego-motion estimation. IEEE Robot Autom Lett 2016;1(1):18–25.CrossRefGoogle Scholar
  72. 72.
    Wang S, Clark R, Wen H, Trigoni N. End-to-end, sequence-to-sequence probabilistic visual odometry through deep neural networks. Int J Robot Res 2018;37(4-5):513–42.CrossRefGoogle Scholar
  73. 73.
    Melekhov I, Kannala J, Rahtu E. 2017. Relative camera pose estimation using convolutional neural networks. arXiv:1702.01381.
  74. 74.
    Turan M, Almalioglu Y, Araujo H, Konukoglu E, Sitti M. Deep endovo: A recurrent convolutional neural network (rcnn) based visual odometry approach for endoscopic capsule robots. Neurocomputing 2018;275: 1861–70.CrossRefGoogle Scholar
  75. 75.
    Zhao H, O’Brien K, Li S, Shepherd RF. Optoelectronically innervated soft prosthetic hand via stretchable optical waveguides. Sci Robot 2016;1(1):eaai7529.CrossRefGoogle Scholar
  76. 76.
    Oliveira GL, Radwan N, Burgard W, Brox T. 2017. Topometric localization with deep learning. arXiv:1706.08775.
  77. 77.
    Peretroukhin V, Kelly J. 2017. DPC-Net: Deep pose correction for visual localization. arXiv:1709.03128.
  78. 78.
    Costante G, Ciarfuglia TA. LS-VO: Learning dense optical subspace for robust visual odometry estimation. In: IEEE Robotics and Automation Letters, 2018; Vol. 3, no. 3, p. 1735–1742.  https://doi.org/10.1109/LRA.2018.2803211.
  79. 79.
    Frost DP, Murray DW, Prisacariu VA. Using learning of speed to stabilize scale in monocular localization and mapping.Google Scholar
  80. 80.
    Nguyen T, Chen SW, Shivakumar SS, Taylor CJ, Kumar V. 2017. Unsupervised deep homography: a fast and robust homography estimation model. arXiv:1709.03966.
  81. 81.
    Clark R, Wang S, Wen H, Markham A, Trigoni N. VINet: visual-inertial odometry as a sequence-to-sequence learning problem. AAAI; 2017. p. 3995–4001.Google Scholar
  82. 82.
    Turan M, Almalioglu Y, Gilbert H, Sari AE, Soylu U, Sitti M. 2017. Endo-VMFuseNet: Deep visual-magnetic sensor fusion approach for uncalibrated, unsynchronized and asymmetric endoscopic capsule robot localization data. arXiv:1709.06041.
  83. 83.
    Turan M, Almalioglu Y, Araujo H, Cemgil T, Sitti M. 2017. Endosensorfusion: particle filtering-based multi-sensory data fusion with switching state-space model for endoscopic capsule robots using recurrent neural network kinematics. arXiv:1709.03401.
  84. 84.
    Pillai S, Leonard JJ. 2017. Towards visual ego-motion learning in robots. arXiv:1705.10279.
  85. 85.
    Byravan A, Fox D. SE3-Nets: learning rigid body motion using deep neural networks. IEEE International Conference on Robotics and Automation (ICRA). IEEE; 2017. p. 173–180.Google Scholar
  86. 86.
    Badrinarayanan V, Kendall A, Cipolla R. Segnet: A deep convolutional encoder-decoder architecture for image segmentation. IEEE Trans Pattern Anal Mach Intell 2017;39(12):2481–95.PubMedCrossRefGoogle Scholar
  87. 87.
    Chen L-C, Papandreou G, Kokkinos I, Murphy K, Yuille AL. 2016. Deeplab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. arXiv:1606.00915.
  88. 88.
    Wu Z, Shen C, Hengel Avd. 2016. Wider or deeper: revisiting the resnet model for visual recognition. arXiv:1611.10080.
  89. 89.
    Zhao H, Shi J, Qi X, Wang X, Jia J. 2016. Pyramid scene parsing network. arXiv:1612.01105.
  90. 90.
    Li R, Gu D, Liu Q, Long Z, Hu H. 2017. Semantic scene mapping with spatio-temporal deep neural network for robotic applications, Cognitive Computation.  https://doi.org/10.1007/s12559-017-9526-9.
  91. 91.
    Zhao C, Sun L, Shuai B, Purkait P, Stolkin R. 2017. Dense RGB-D semantic mapping with pixel-voxel neural network. arXiv:1710.00132.
  92. 92.
    Li R, Gu D, Liu Q, Long Z, Hu H. Semantic scene mapping with spatio-temporal deep neural network for robotic applications. Cogn Comput 2018;10(2):260–271.CrossRefGoogle Scholar
  93. 93.
    Liu W, Rabinovich A, Berg AC. 2015. ParseNet: looking wider to see better. arXiv:1506.04579.
  94. 94.
    Kendall A, Badrinarayanan V, Cipolla R. 2015. Bayesian segnet: Model uncertainty in deep convolutional encoder-decoder architectures for scene understanding. arXiv:1511.02680.
  95. 95.
    Zheng S, Jayasumana S, Romera-Paredes B, Vineet V, Su Z, Du D, Huang C, Torr PH. Conditional random fields as recurrent neural networks. IEEE International Conference on Computer Vision; 2015. p. 1529–1537.Google Scholar
  96. 96.
    Arnab A, Jayasumana S, Zheng S, Torr PH. Higher order conditional random fields in deep neural networks. European Conference on Computer Vision. Springer; 2016. p. 524–540.Google Scholar
  97. 97.
    Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition. International Conference on Learning Representations; 2015. p. 1–14.Google Scholar
  98. 98.
    He K, Zhang X, Ren S, Sun J. Deep residual learning for image recognition. IEEE Conference on Computer Vision and Pattern Recognition; 2016. p. 770–778.Google Scholar
  99. 99.
    Deng J, Dong W, Socher R, Li L-J, Li K, Fei-Fei L. Imagenet: a large-scale hierarchical image database. IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE; 2009. p. 248–255.Google Scholar
  100. 100.
    Chen L-C, Papandreou G, Kokkinos I, Murphy K, Yuille AL. 2014. Semantic image segmentation with deep convolutional nets and fully connected CRFs. arXiv:1412.7062.
  101. 101.
    Chen L-C, Yang Y, Wang J, Xu W, Yuille AL. Attention to scale: scale-aware semantic image segmentation. IEEE Conference on Computer Vision and Pattern Recognition; 2016. p. 3640–3649.Google Scholar
  102. 102.
    Yu F, Koltun V. 2015. Multi-scale context aggregation by dilated convolutions. arXiv:1511.07122.
  103. 103.
    Everingham M, Eslami SA, Van Gool L, Williams CK, Winn J, Zisserman A. The pascal visual object classes challenge: a retrospective. Int J Comput Vis 2015;111(1):98–136.CrossRefGoogle Scholar
  104. 104.
    Wu Z, Shen C, Hengel Avd. 2016. High-performance semantic segmentation using very deep fully convolutional networks. arXiv:1604.04339.
  105. 105.
    Liu X, Deng Z. Segmentation of drivable road using deep fully convolutional residual network with pyramid pooling. Cogn Comput 2018;10(2):272–281.CrossRefGoogle Scholar
  106. 106.
    Hazirbas C, Ma L, Domokos C, Cremers D. Fusenet: Incorporating depth into semantic segmentation via fusion-based CNN architecture. Asian conference on computer vision; 2016.Google Scholar
  107. 107.
    Valada A, Oliveira G, Brox T, Burgard W. Towards robust semantic segmentation using deep fusion. Robotics: Science and systems (RSS 2016) Workshop, Are the Sceptics Right? Limits and Potentials of Deep Learning in Robotics; 2016.Google Scholar
  108. 108.
    Valada A, Vertens J, Dhall A, Burgard W. Adapnet: adaptive semantic segmentation in adverse environmental conditions. IEEE International conference on robotics and automation (ICRA). IEEE; 2017.Google Scholar
  109. 109.
    Hülse M., McBride S, Lee M. Fast learning mapping schemes for robotic hand–eye coordination. Cogn Comput 2010;2(1):1–16.CrossRefGoogle Scholar
  110. 110.
    Pathak D, Krahenbuhl P, Darrell T. Constrained convolutional neural networks for weakly supervised segmentation. Proceedings of the IEEE international conference on computer vision; 2015. p. 1796–1804.Google Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2018

Authors and Affiliations

  1. 1.School of Computer Science and Electronic EngineeringUniversity of EssexColchesterUK
  2. 2.Edinburgh Centre for RoboticsHeriot-Watt UniversityEdinburghUK

Personalised recommendations