3D Shape Reconstruction in Traffic Scenarios Using Monocular Camera and Lidar

  • Qing RaoEmail author
  • Lars Krüger
  • Klaus Dietmayer
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10117)


In the near future, a self-driving car will be able to perceive and understand its surroundings by composing a 3D environment map at object level. In this map, the 3D shapes of surrounding objects will be precisely reconstructed. The technique to reconstructing 3D object shapes using a monocular camera and a Lidar is presented in this paper. The proposed approach combines deep neural networks with an optimization process called 3D Shaping in which object pose and shape are jointly optimized. A significant performance improvement by the proposed approach in estimating object 3D orientation and the occupancy bounding box is proven through quantitative evaluation.


Discrete Cosine Transform Convolutional Neural Network Deep Neural Network Sign Distance Function Orientation Estimation 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Supplementary material

426014_1_En_1_MOESM1_ESM.pdf (360 kb)
Supplementary material 1 (pdf 360 KB)


  1. 1.
    Google X: Google Self-Driving Car Project (2014)Google Scholar
  2. 2.
    Dickmann, J., Appenrodt, N., Klappstein, J., Blöcher, H.L., Muntzinger, M., Sailer, A., Hahn, M., Brenk, C.: Making Bertha see even more: radar contribution. IEEE Access 3, 1233–1247 (2015)CrossRefGoogle Scholar
  3. 3.
    Franke, U., Pfeiffer, D., Rabe, C., Knöppel, C., Enzweiler, M., Stein, F., Herrtwich, R.G.: Making Bertha see. In: ICCV Workshops 2013, pp. 214–221. IEEE (2013)Google Scholar
  4. 4.
    Rusu, R., Blodow, N., Marton, Z., Soos, A., Beetz, M.: Towards 3D object maps for autonomous household robots. In: IROS 2007, pp. 3191–3198. IEEE (2007)Google Scholar
  5. 5.
    Rusu, R., Marton, Z., Blodow, N., Holzbach, A., Beetz, M.: Model-based and learned semantic object labeling in 3D point cloud maps of kitchen environments. In: IROS 2009, pp. 3601–3608. IEEE (2009)Google Scholar
  6. 6.
    Miksik, O., Amar, Y., Vineet, V., Pérez, P., Torr, P.H.S.: Incremental dense multi-modal 3D scene reconstruction. In: IROS 2015, pp. 908–915. IEEE (2015)Google Scholar
  7. 7.
    Sengupta, S., Greveson, E., Shahrokni, A., Torr, P.H.S.: Urban 3D semantic modelling using stereo vision. In: ICRA 2013, pp. 580–585. IEEE (2013)Google Scholar
  8. 8.
    Vineet, V., Miksik, O., Lidegaard, M., Niebner, M., Golodetz, S., Prisacariu, V.A., Kahler, O., Murray, D.W., Izadi, S., Pérez, P., Torr, P.H.S.: Incremental dense semantic stereo fusion for large-scale semantic scene reconstruction. In: ICRA 2015, pp. 75–82. IEEE (2015)Google Scholar
  9. 9.
    Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: CVPR 2014, pp. 580–587. IEEE (2014)Google Scholar
  10. 10.
    Krizhevsky, A., Sutskever, I., Hinton, G.: ImageNet classification with deep convolutional neural networks. In: NIPS 2012, pp. 1097–1105. NIPS Foundation (2012)Google Scholar
  11. 11.
    Redmon, J., Divvala, S., Girshick, R.B., Farhadi, A.: You only look once: unified, real-time object detection. In: CVPR 2016, IEEE (2016, to appear)Google Scholar
  12. 12.
    Hariharan, B., Arbeláez, P., Girshick, R., Malik, J.: Simultaneous detection and segmentation. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8695, pp. 297–312. Springer, Heidelberg (2014). doi: 10.1007/978-3-319-10584-0_20 Google Scholar
  13. 13.
    Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: CVPR 2015, pp. 3431–3440. IEEE (2015)Google Scholar
  14. 14.
    Zheng, S., Jayasumana, S., Romera-Paredes, B., Vineet, V., Su, Z., Du, D., Huang, C., Torr, P.H.S.: Conditional random fields as recurrent neural networks. In: ICCV 2015, pp. 1529–1537. IEEE (2015)Google Scholar
  15. 15.
    Beyer, L., Hermans, A., Leibe, B.: Biternion nets: continuous head pose regression from discrete training labels. In: Gall, J., Gehler, P., Leibe, B. (eds.) GCPR 2015. LNCS, vol. 9358, pp. 157–168. Springer, Heidelberg (2015). doi: 10.1007/978-3-319-24947-6_13 CrossRefGoogle Scholar
  16. 16.
    Su, H., Qi, C.R., Li, Y., Guibas, L.: Render for CNN: viewpoint estimation in images using CNNs trained with rendered 3D model views. In: ICCV 2015, pp. 2686–2694. IEEE (2015)Google Scholar
  17. 17.
    Tulsiani, S., Malik, J.: Viewpoints and keypoints. In: CVPR 2015, pp. 1510–1519. IEEE (2015)Google Scholar
  18. 18.
    Eigen, D., Fergus, R.: Predicting depth, surface normals and semantic labels with a common multi-scale convolutional architecture. In: ICCV 2015, pp. 2650–2658. IEEE (2015)Google Scholar
  19. 19.
    Liu, F., Shen, C., Lin, G.: Deep convolutional neural fields for depth estimation from a single image. In: CVPR 2015, pp. 5162–5170. IEEE (2015)Google Scholar
  20. 20.
    Wu, Z., Song, S., Khosla, A., Yu, F., Zhang, L., Tang, X., Xiao, J.: 3D ShapeNets: a deep representation for volumetric shapes. In: CVPR 2015, pp. 1912–1920. IEEE (2015)Google Scholar
  21. 21.
    Schmidhuber, J.: Deep learning in neural networks: an overview. Neural Netw. 61, 85–117 (2015)CrossRefGoogle Scholar
  22. 22.
    Hariharan, B., Arbeláez, P., Girshick, R.B., Malik, J.: Hypercolumns for object segmentation and fine-grained localization. In: CVPR 2015, pp. 447–456. IEEE (2015)Google Scholar
  23. 23.
    Lin, G., Shen, C., van dan Hengel, A., Reid, I.: Efficient piecewise training of deep structured models for semantic segmentation. In: CVPR 2016, IEEE (2016, to appear)Google Scholar
  24. 24.
    Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR 2016, IEEE (2016, to appear)Google Scholar
  25. 25.
    Prisacariu, V.A., Reid, I.D.: PWP3D: real-time segmentation and tracking of 3D objects. Int. J. Comput. Vis. 98, 335–354 (2012)MathSciNetCrossRefGoogle Scholar
  26. 26.
    Sandhu, R., Dambreville, S., Yezzi, A., Tannenbaum, A.: Non-rigid 2D–3D pose estimation and 2D image segmentation. In: CVPR 2009, pp. 786–793. IEEE (2009)Google Scholar
  27. 27.
    Ren, C.Y., Reid, I.: A unified energy minimization framework for model fitting in depth. In: Fusiello, A., Murino, V., Cucchiara, R. (eds.) ECCV 2012. LNCS, vol. 7584, pp. 72–82. Springer, Heidelberg (2012). doi: 10.1007/978-3-642-33868-7_8 Google Scholar
  28. 28.
    Prisacariu, V.A., Segal, A.V., Reid, I.: Simultaneous monocular 2D segmentation, 3D pose recovery and 3D reconstruction. In: Lee, K.M., Matsushita, Y., Rehg, J.M., Hu, Z. (eds.) ACCV 2012. LNCS, vol. 7724, pp. 593–606. Springer, Heidelberg (2013). doi: 10.1007/978-3-642-37331-2_45 CrossRefGoogle Scholar
  29. 29.
    Lawrence, N.: Probabilistic non-linear principal component analysis with Gaussian process latent variable models. J. Mach. Learn. Res. 6, 1783–1816 (2005)MathSciNetzbMATHGoogle Scholar
  30. 30.
    Dame, A., Prisacariu, V.A., Ren, C.Y., Reid, I.D.: Dense reconstruction using 3D object shape priors. In: CVPR 2013, pp. 1288–1295. IEEE (2013)Google Scholar
  31. 31.
    Güney, F., Geiger, A.: Displets: resolving stereo ambiguities using object knowledge. In: CVPR 2015, pp. 4165–4175. IEEE (2015)Google Scholar
  32. 32.
    Newcombe, R.A., Lovegrove, S.J., Davison, A.J.: DTAM: dense tracking and mapping in real-time. In: ICCV 2011, pp. 2320–2327. IEEE (2011)Google Scholar
  33. 33.
    Rao, Q., Krüger, L., Dietmayer, K.: Monocular 3D shape reconstruction using deep neural networks. In: IV 2016, pp. 310–315. IEEE (2016)Google Scholar
  34. 34.
    Xiang, Y., Mottaghi, R., Savarese, S.: Beyond PASCAL: a benchmark for 3D object detection in the wild. In: WACV 2014, pp. 75–82. IEEE (2014)Google Scholar
  35. 35.
    Geiger, A., Lenz, P., Urtasun, R.: Are we ready for autonomous driving? The KITTI vision benchmark suite. In: CVPR 2012, pp. 3354–3361. IEEE (2012)Google Scholar
  36. 36.
    Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Going deeper with convolutions. In: CVPR 2015, pp. 1–9. IEEE (2015)Google Scholar
  37. 37.
    Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R.B., Guadarrama, S., Darrell, T.: Caffe: convolutional architecture for fast feature embedding. In: MM 2014, pp. 675–678. ACM (2014)Google Scholar
  38. 38.
    Rusu, R.: Semantic 3D object maps for everyday manipulation in human living environments. Ph.D. thesis, Computer Science Department, Technische Universität, München, Germany (2009)Google Scholar
  39. 39.
    Dai, J., Kaiming, H., Sun, J.: Instance-aware semantic segmentation via multi-task network cascades. In: CVPR 2016, IEEE (2016, to appear)Google Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  1. 1.Daimler AGUlmGermany
  2. 2.Ulm UniversityUlmGermany

Personalised recommendations