3D Visual Object Detection from Monocular Images

Wang, Qiaosong; Rasmussen, Christopher

doi:10.1007/978-3-030-33720-9_13

Qiaosong Wang²⁰ &
Christopher Rasmussen²⁰

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 11844))

Included in the following conference series:

International Symposium on Visual Computing

2105 Accesses

Abstract

3D visual object detection is a fundamental requirement for autonomous vehicles. However, accurately detecting 3D objects was until recently a quality unique to expensive LiDAR ranging devices. Approaches based on cheaper monocular imagery are typically incapable of identifying 3D objects. In this paper, we propose a novel approach to predict accurate 3D bounding box locations on monocular images. We first train a generative adversarial network (GAN) to perform monocular depth estimation. The ground truth training depth data is obtained via depth completion on LiDAR scans. Next, we combine both depth and appearance data into a birds-eye-view representation with height, density and grayscale intensity as the three feature channels. Finally, We train a convolutional neural network (CNN) on our feature map leveraging bounding boxes annotated on corresponding LiDAR scans. Experiments show that our method performs favorably against baselines.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Complex yolo with uncertainty. https://github.com/wl5/complex_yolo_3d
pykitti open source utility library. https://github.com/utiasSTARS/pykitti
Abadi, M., et al.: TensorFlow: a system for large-scale machine learning. In: USENIX Symposium, pp. 265–283 (2016)
Google Scholar
Ali, W., Abdelkarim, S., Zidan, M., Zahran, M., Sallab, A.E.: YOLO3D: end-to-end real-time 3D oriented object bounding box detection from LiDAR point cloud. In: Leal-Taixé, L., Roth, S. (eds.) ECCV 2018. LNCS, vol. 11131, pp. 716–728. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-11015-4_54
Chapter Google Scholar
Blum, A., Mitchell, T.: Combining labeled and unlabeled data with co-training. In: Proceedings Annual Conference on Computational Learning Theory, pp. 92–100. ACM (1998)
Google Scholar
Chen, X., Ma, H., Wan, J., Li, B., Xia, T.: Multi-view 3D object detection network for autonomous driving. In: CVPR, pp. 1907–1915 (2017)
Google Scholar
Eitel, A., Springenberg, J.T., Spinello, L., Riedmiller, M., Burgard, W.: Multimodal deep learning for robust RGB-D object recognition. In: IROS, pp. 681–687. IEEE (2015)
Google Scholar
El Sallab, A., Sobh, I., Zidan, M., Zahran, M., Abdelkarim, S.: YOLO4D: a spatio-temporal approach for real-time multi-object detection and classification from LiDAR point clouds (2018)
Google Scholar
Everingham, M., Van Gool, L., Williams, C.K., Winn, J., Zisserman, A.: The PASCAL visual object classes (VOC) challenge. IJCV 88(2), 303–338 (2010)
Article Google Scholar
Geiger, A., Lenz, P., Stiller, C., Urtasun, R.: Vision meets robotics: the KITTI dataset. IJRR 32(11), 1231–1237 (2013)
Google Scholar
Godard, C., Mac Aodha, O., Brostow, G.J.: Unsupervised monocular depth estimation with left-right consistency. In: CVPR, vol. 2, p. 7 (2017)
Google Scholar
Goodfellow, I., et al.: Generative adversarial nets. In: NIPS, pp. 2672–2680 (2014)
Google Scholar
Gupta, S., Girshick, R., Arbeláez, P., Malik, J.: Learning rich features from RGB-D images for object detection and segmentation. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8695, pp. 345–360. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10584-0_23
Chapter Google Scholar
He, D., et al.: Dual learning for machine translation. In: NIPS, pp. 820–828 (2016)
Google Scholar
Hertzmann, A., Jacobs, C.E., Oliver, N., Curless, B., Salesin, D.H.: Image analogies. In: Conference on Computer Graphics and Interactive Techniques. ACM (2001)
Google Scholar
Isola, P., Zhu, J.Y., Zhou, T., Efros, A.A.: Image-to-image translation with conditional adversarial networks. arXiv preprint (2017)
Google Scholar
Lenz, I., Lee, H., Saxena, A.: Deep learning for detecting robotic grasps. IJRR 34(4–5), 705–724 (2015)
Google Scholar
Liang, M., Yang, B., Wang, S., Urtasun, R.: Deep continuous fusion for multi-sensor 3D object detection. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11220, pp. 663–678. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01270-0_39
Chapter Google Scholar
Mal, F., Karaman, S.: Sparse-to-dense: depth prediction from sparse depth samples and a single image. In: ICRA, pp. 1–8. IEEE (2018)
Google Scholar
Montemerlo, M., et al.: Junior: the stanford entry in the urban challenge. J. Field Robot. 25(9), 569–597 (2008)
Article Google Scholar
Qi, C.R., Liu, W., Wu, C., Su, H., Guibas, L.J.: Frustum PointNets for 3D object detection from RGB-D data. In: CVPR, pp. 918–927 (2018)
Google Scholar
Redmon, J., Farhadi, A.: YOLO9000: better, faster, stronger. In: CVPR, pp. 7263–7271 (2017)
Google Scholar
Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: NIPS, pp. 91–99 (2015)
Google Scholar
Simon, M., Milz, S., Amende, K., Gross, H.-M.: Complex-YOLO: an Euler-region-proposal for real-time 3D object detection on point clouds. In: Leal-Taixé, L., Roth, S. (eds.) ECCV 2018. LNCS, vol. 11129, pp. 197–209. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-11009-3_11
Chapter Google Scholar
Socher, R., Huval, B., Bath, B., Manning, C.D., Ng, A.Y.: Convolutional-recursive deep learning for 3D object classification. In: NIPS, pp. 656–664 (2012)
Google Scholar
Song, S., Lichtenberg, S.P., Xiao, J.: SUN RGB-D: a RGB-D scene understanding benchmark suite. In: CVPR, vol. 5, p. 6 (2015)
Google Scholar
Wang, Y., Chao, W.L., Garg, D., Hariharan, B., Campbell, M., Weinberger, K.: Pseudo-LiDAR from visual depth estimation: bridging the gap in 3D object detection for autonomous driving. arXiv preprint arXiv:1812.07179 (2018)
Zhou, T., Krahenbuhl, P., Aubry, M., Huang, Q., Efros, A.A.: Learning dense correspondence via 3D-guided cycle consistency. In: CVPR, pp. 117–126 (2016)
Google Scholar
Zhu, J.Y., Park, T., Isola, P., Efros, A.A.: Unpaired image-to-image translation using cycle-consistent adversarial networks. In: ICCV (2017)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer and Information Sciences, University of Delaware, Newark, DE, USA
Qiaosong Wang & Christopher Rasmussen

Authors

Qiaosong Wang
View author publications
You can also search for this author in PubMed Google Scholar
Christopher Rasmussen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Qiaosong Wang .

Editor information

Editors and Affiliations

University of Nevada, Reno, NV, USA
George Bebis
NASA Ames Research Center, Moffett Field, CA, USA
Richard Boyle
University of Nevada, Reno, NV, USA
Bahram Parvin
Desert Research Institute, Reno, NV, USA
Darko Koracin
Lawrence Berkeley National Laboratory, Berkeley, CA, USA
Daniela Ushizima
Latent AI, Palo Alto, CA, USA
Sek Chai
Texas A&M University, College Station, TX, USA
Shinjiro Sueda
Louisiana State University, Baton Rouge, LA, USA
Xin Lin
University of North Carolina at Charlotte, Charlotte, NC, USA
Aidong Lu
École Polytechnique Fédérale de Lausanne, Lausanne, Switzerland
Daniel Thalmann
Notre Dame University, Notre Dame, IN, USA
Chaoli Wang
Bosch Research North America, Palo Alto, CA, USA
Panpan Xu

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wang, Q., Rasmussen, C. (2019). 3D Visual Object Detection from Monocular Images. In: Bebis, G., et al. Advances in Visual Computing. ISVC 2019. Lecture Notes in Computer Science(), vol 11844. Springer, Cham. https://doi.org/10.1007/978-3-030-33720-9_13

Download citation

DOI: https://doi.org/10.1007/978-3-030-33720-9_13
Published: 21 October 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-33719-3
Online ISBN: 978-3-030-33720-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics