Abstract
3D visual object detection is a fundamental requirement for autonomous vehicles. However, accurately detecting 3D objects was until recently a quality unique to expensive LiDAR ranging devices. Approaches based on cheaper monocular imagery are typically incapable of identifying 3D objects. In this paper, we propose a novel approach to predict accurate 3D bounding box locations on monocular images. We first train a generative adversarial network (GAN) to perform monocular depth estimation. The ground truth training depth data is obtained via depth completion on LiDAR scans. Next, we combine both depth and appearance data into a birds-eye-view representation with height, density and grayscale intensity as the three feature channels. Finally, We train a convolutional neural network (CNN) on our feature map leveraging bounding boxes annotated on corresponding LiDAR scans. Experiments show that our method performs favorably against baselines.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Complex yolo with uncertainty. https://github.com/wl5/complex_yolo_3d
pykitti open source utility library. https://github.com/utiasSTARS/pykitti
Abadi, M., et al.: TensorFlow: a system for large-scale machine learning. In: USENIX Symposium, pp. 265–283 (2016)
Ali, W., Abdelkarim, S., Zidan, M., Zahran, M., Sallab, A.E.: YOLO3D: end-to-end real-time 3D oriented object bounding box detection from LiDAR point cloud. In: Leal-Taixé, L., Roth, S. (eds.) ECCV 2018. LNCS, vol. 11131, pp. 716–728. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-11015-4_54
Blum, A., Mitchell, T.: Combining labeled and unlabeled data with co-training. In: Proceedings Annual Conference on Computational Learning Theory, pp. 92–100. ACM (1998)
Chen, X., Ma, H., Wan, J., Li, B., Xia, T.: Multi-view 3D object detection network for autonomous driving. In: CVPR, pp. 1907–1915 (2017)
Eitel, A., Springenberg, J.T., Spinello, L., Riedmiller, M., Burgard, W.: Multimodal deep learning for robust RGB-D object recognition. In: IROS, pp. 681–687. IEEE (2015)
El Sallab, A., Sobh, I., Zidan, M., Zahran, M., Abdelkarim, S.: YOLO4D: a spatio-temporal approach for real-time multi-object detection and classification from LiDAR point clouds (2018)
Everingham, M., Van Gool, L., Williams, C.K., Winn, J., Zisserman, A.: The PASCAL visual object classes (VOC) challenge. IJCV 88(2), 303–338 (2010)
Geiger, A., Lenz, P., Stiller, C., Urtasun, R.: Vision meets robotics: the KITTI dataset. IJRR 32(11), 1231–1237 (2013)
Godard, C., Mac Aodha, O., Brostow, G.J.: Unsupervised monocular depth estimation with left-right consistency. In: CVPR, vol. 2, p. 7 (2017)
Goodfellow, I., et al.: Generative adversarial nets. In: NIPS, pp. 2672–2680 (2014)
Gupta, S., Girshick, R., Arbeláez, P., Malik, J.: Learning rich features from RGB-D images for object detection and segmentation. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8695, pp. 345–360. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10584-0_23
He, D., et al.: Dual learning for machine translation. In: NIPS, pp. 820–828 (2016)
Hertzmann, A., Jacobs, C.E., Oliver, N., Curless, B., Salesin, D.H.: Image analogies. In: Conference on Computer Graphics and Interactive Techniques. ACM (2001)
Isola, P., Zhu, J.Y., Zhou, T., Efros, A.A.: Image-to-image translation with conditional adversarial networks. arXiv preprint (2017)
Lenz, I., Lee, H., Saxena, A.: Deep learning for detecting robotic grasps. IJRR 34(4–5), 705–724 (2015)
Liang, M., Yang, B., Wang, S., Urtasun, R.: Deep continuous fusion for multi-sensor 3D object detection. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11220, pp. 663–678. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01270-0_39
Mal, F., Karaman, S.: Sparse-to-dense: depth prediction from sparse depth samples and a single image. In: ICRA, pp. 1–8. IEEE (2018)
Montemerlo, M., et al.: Junior: the stanford entry in the urban challenge. J. Field Robot. 25(9), 569–597 (2008)
Qi, C.R., Liu, W., Wu, C., Su, H., Guibas, L.J.: Frustum PointNets for 3D object detection from RGB-D data. In: CVPR, pp. 918–927 (2018)
Redmon, J., Farhadi, A.: YOLO9000: better, faster, stronger. In: CVPR, pp. 7263–7271 (2017)
Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: NIPS, pp. 91–99 (2015)
Simon, M., Milz, S., Amende, K., Gross, H.-M.: Complex-YOLO: an Euler-region-proposal for real-time 3D object detection on point clouds. In: Leal-Taixé, L., Roth, S. (eds.) ECCV 2018. LNCS, vol. 11129, pp. 197–209. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-11009-3_11
Socher, R., Huval, B., Bath, B., Manning, C.D., Ng, A.Y.: Convolutional-recursive deep learning for 3D object classification. In: NIPS, pp. 656–664 (2012)
Song, S., Lichtenberg, S.P., Xiao, J.: SUN RGB-D: a RGB-D scene understanding benchmark suite. In: CVPR, vol. 5, p. 6 (2015)
Wang, Y., Chao, W.L., Garg, D., Hariharan, B., Campbell, M., Weinberger, K.: Pseudo-LiDAR from visual depth estimation: bridging the gap in 3D object detection for autonomous driving. arXiv preprint arXiv:1812.07179 (2018)
Zhou, T., Krahenbuhl, P., Aubry, M., Huang, Q., Efros, A.A.: Learning dense correspondence via 3D-guided cycle consistency. In: CVPR, pp. 117–126 (2016)
Zhu, J.Y., Park, T., Isola, P., Efros, A.A.: Unpaired image-to-image translation using cycle-consistent adversarial networks. In: ICCV (2017)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Wang, Q., Rasmussen, C. (2019). 3D Visual Object Detection from Monocular Images. In: Bebis, G., et al. Advances in Visual Computing. ISVC 2019. Lecture Notes in Computer Science(), vol 11844. Springer, Cham. https://doi.org/10.1007/978-3-030-33720-9_13
Download citation
DOI: https://doi.org/10.1007/978-3-030-33720-9_13
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-33719-3
Online ISBN: 978-3-030-33720-9
eBook Packages: Computer ScienceComputer Science (R0)