Improving 3D Object Detection Through Progressive Population Based Augmentation

Cheng, Shuyang; Leng, Zhaoqi; Cubuk, Ekin Dogus; Zoph, Barret; Bai, Chunyan; Ngiam, Jiquan; Song, Yang; Caine, Benjamin; Vasudevan, Vijay; Li, Congcong; Le, Quoc V.; Shlens, Jonathon; Anguelov, Dragomir

doi:10.1007/978-3-030-58589-1_17

Shuyang Cheng¹²,
Zhaoqi Leng¹²,
Ekin Dogus Cubuk¹³,
Barret Zoph¹³,
Chunyan Bai¹²,
Jiquan Ngiam¹³,
Yang Song¹²,
Benjamin Caine¹³,
Vijay Vasudevan¹³,
Congcong Li¹²,
Quoc V. Le¹³,
Jonathon Shlens¹³ &
…
Dragomir Anguelov¹²

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 12366))

Included in the following conference series:

European Conference on Computer Vision

4038 Accesses
25 Citations

Abstract

Data augmentation has been widely adopted for object detection in 3D point clouds. However, all previous related efforts have focused on manually designing specific data augmentation methods for individual architectures. In this work, we present the first attempt to automate the design of data augmentation policies for 3D object detection. We introduce the Progressive Population Based Augmentation (PPBA) algorithm, which learns to optimize augmentation strategies by narrowing down the search space and adopting the best parameters discovered in previous iterations. On the KITTI 3D detection test set, PPBA improves the StarNet detector by substantial margins on the moderate difficulty category of cars, pedestrians, and cyclists, outperforming all current state-of-the-art single-stage detection models. Additional experiments on the Waymo Open Dataset indicate that PPBA continues to effectively improve the StarNet and PointPillars detectors on a 20x larger dataset compared to KITTI. The magnitude of the improvements may be comparable to advances in 3D perception architectures and the gains come without an incurred cost at inference time. In subsequent experiments, we find that PPBA may be up to 10x more data efficient than baseline 3D detection models without augmentation, highlighting that 3D detection models may achieve competitive accuracy with far fewer labeled examples.

Z. Leng—Work done while at Google LLC.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
http://github.com/tensorflow/lingvo.
2.
http://github.com/tensorflow/lingvo.
3.
Our initial experiment on random search shows the performance distribution of augmentation policies is spread on the KITTI validation split. In order to save computation resources, the random search here is performed on a fine-grained search space.

References

Chen, X., Ma, H., Wan, J., Li, B., Xia, T.: Multi-view 3D object detection network for autonomous driving. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1907–1915 (2017)
Google Scholar
Cho, H., Seo, Y.W., Kumar, B.V., Rajkumar, R.R.: A multi-sensor fusion system for moving object detection and tracking in urban driving environments. In: 2014 IEEE International Conference on Robotics and Automation (ICRA), pp. 1836–1843. IEEE (2014)
Google Scholar
Ciregan, D., Meier, U., Schmidhuber, J.: Multi-column deep neural networks for image classification. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 3642–3649. IEEE (2012)
Google Scholar
Cubuk, E.D., Zoph, B., Mane, D., Vasudevan, V., Le, Q.V.: Autoaugment: learning augmentation policies from data. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2019)
Google Scholar
Cubuk, E.D., Zoph, B., Shlens, J., Le, Q.V.: Randaugment: practical data augmentation with no separate search. arXiv preprint arXiv:1909.13719 (2019)
DeVries, T., Taylor, G.W.: Improved regularization of convolutional neural networks with cutout. arXiv preprint arXiv:1708.04552 (2017)
Dwibedi, D., Misra, I., Hebert, M.: Cut, paste and learn: surprisingly easy synthesis for instance detection. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1301–1310 (2017)
Google Scholar
Fang, H.S., Sun, J., Wang, R., Gou, M., Li, Y.L., Lu, C.: Instaboost: boosting instance segmentation via probability map guided copy-pasting. In: The IEEE International Conference on Computer Vision (2019)
Google Scholar
Geiger, A., Lenz, P., Stiller, C., Urtasun, R.: Are we ready for autonomous driving? The Kitti vision benchmark suite. In: Conference on Computer Vision and Pattern Recognition (CVPR) (2012)
Google Scholar
Girshick, R., Radosavovic, I., Gkioxari, G., Dollár, P., He, K.: Detectron (2018)
Google Scholar
Ho, D., Liang, E., Stoica, I., Abbeel, P., Chen, X.: Population based augmentation: Efficient learning of augmentation policy schedules. In: International Conference on Machine Learning, pp. 2731–2741 (2019)
Google Scholar
Jaderberg, M., et al.: Population based training of neural networks. arXiv preprint arXiv:1711.09846 (2017)
Jouppi, N.P., et al.: In-datacenter performance analysis of a tensor processing unit. In: Proceedings of the 44th Annual International Symposium on Computer Architecture, pp. 1–12 (2017)
Google Scholar
Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems (2012)
Google Scholar
Kumar, S., et al.: Scale MLPerf-0.6 models on Google TPU-v3 pods. arXiv preprint arXiv:1909.09756 (2019)
Lang, A.H., Vora, S., Caesar, H., Zhou, L., Yang, J., Beijbom, O.: Pointpillars: fast encoders for object detection from point clouds. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 12697–12705 (2019)
Google Scholar
Lemley, J., Bazrafkan, S., Corcoran, P.: Smart augmentation learning an optimal data augmentation strategy. IEEE Access 5, 5858–5869 (2017)
Article Google Scholar
Li, R., Li, X., Heng, P.A., Fu, C.W.: Pointaugment: an auto-augmentation framework for point cloud classification. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6378–6387 (2020)
Google Scholar
Liang, M., Yang, B., Wang, S., Urtasun, R.: Deep continuous fusion for multi-sensor 3D object detection. In: Proceedings of the European Conference on Computer Vision, pp. 641–656 (2018)
Google Scholar
Lim, S., Kim, I., Kim, T., Kim, C., Kim, S.: Fast autoaugment. In: Advances in Neural Information Processing Systems (2019)
Google Scholar
Luo, W., Yang, B., Urtasun, R.: Fast and furious: real time end-to-end 3D detection, tracking and motion forecasting with a single convolutional net. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3569–3577 (2018)
Google Scholar
Ngiam, J., et al.: Starnet: targeted computation for object detection in point clouds. arXiv preprint arXiv:1908.11069 (2019)
Ratner, A.J., Ehrenberg, H., Hussain, Z., Dunnmon, J., Ré, C.: Learning to compose domain-specific transformations for data augmentation. In: Advances in Neural Information Processing Systems, pp. 3239–3249 (2017)
Google Scholar
Sato, I., Nishimura, H., Yokoi, K.: Apac: augmented pattern classification with neural networks. arXiv preprint arXiv:1505.03229 (2015)
Shi, S., et al.: PV-RCNN: point-voxel feature set abstraction for 3D object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10529–10538 (2020)
Google Scholar
Simard, P.Y., Steinkraus, D., Platt, J.C., et al.: Best practices for convolutional neural networks applied to visual document analysis. In: Proceedings of International Conference on Document Analysis and Recognition (2003)
Google Scholar
Sun, P., et al.: Scalability in perception for autonomous driving: Waymo open dataset. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2446–2454 (2020)
Google Scholar
Szegedy, C., et al.: Going deeper with convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1–9 (2015)
Google Scholar
Thrun, S., et al.: Stanley: the robot that won the Darpa grand challenge. J. Field Robot. 23(9), 661–692 (2006)
Article Google Scholar
Wan, L., Zeiler, M., Zhang, S., Le Cun, Y., Fergus, R.: Regularization of neural networks using dropconnect. In: International Conference on Machine Learning, pp. 1058–1066 (2013)
Google Scholar
Yan, Y., Mao, Y., Li, B.: Second: sparsely embedded convolutional detection. Sensors 18(10), 3337 (2018)
Article Google Scholar
Yang, B., Liang, M., Urtasun, R.: Hdnet: exploiting HD maps for 3D object detection. In: Proceedings of the 2nd Conference on Robot Learning, pp. 146–155 (2018)
Google Scholar
Yang, B., Luo, W., Urtasun, R.: Pixor: real-time 3D object detection from point clouds. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7652–7660 (2018)
Google Scholar
Zhang, H., Cisse, M., Dauphin, Y.N., Lopez-Paz, D.: mixup: beyond empirical risk minimization. In: International Conference on Learning Representations (2018)
Google Scholar
Zhou, D., et al.: IoU loss for 2D/3D object detection. In: International Conference on 3D Vision (3DV). IEEE (2019)
Google Scholar
Zhou, Y., et al.: End-to-end multi-view fusion for 3D object detection in lidar point clouds. In: Proceedings of the Conference on Robot Learning (2019)
Google Scholar
Zhou, Y., Tuzel, O.: Voxelnet: end-to-end learning for point cloud based 3D object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4490–4499 (2018)
Google Scholar
Zoph, B., Cubuk, E.D., Ghiasi, G., Lin, T.Y., Shlens, J., Le, Q.V.: Learning data augmentation strategies for object detection. arXiv preprint arXiv:1906.11172 (2019)

Download references

Author information

Authors and Affiliations

Waymo LLC, Mountain View, USA
Shuyang Cheng, Zhaoqi Leng, Chunyan Bai, Yang Song, Congcong Li & Dragomir Anguelov
Google LLC, Mountain View, USA
Ekin Dogus Cubuk, Barret Zoph, Jiquan Ngiam, Benjamin Caine, Vijay Vasudevan, Quoc V. Le & Jonathon Shlens

Authors

Shuyang Cheng
View author publications
You can also search for this author in PubMed Google Scholar
Zhaoqi Leng
View author publications
You can also search for this author in PubMed Google Scholar
Ekin Dogus Cubuk
View author publications
You can also search for this author in PubMed Google Scholar
Barret Zoph
View author publications
You can also search for this author in PubMed Google Scholar
Chunyan Bai
View author publications
You can also search for this author in PubMed Google Scholar
Jiquan Ngiam
View author publications
You can also search for this author in PubMed Google Scholar
Yang Song
View author publications
You can also search for this author in PubMed Google Scholar
Benjamin Caine
View author publications
You can also search for this author in PubMed Google Scholar
Vijay Vasudevan
View author publications
You can also search for this author in PubMed Google Scholar
Congcong Li
View author publications
You can also search for this author in PubMed Google Scholar
Quoc V. Le
View author publications
You can also search for this author in PubMed Google Scholar
Jonathon Shlens
View author publications
You can also search for this author in PubMed Google Scholar
Dragomir Anguelov
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Shuyang Cheng .

Editor information

Editors and Affiliations

University of Oxford, Oxford, UK
Andrea Vedaldi
Graz University of Technology, Graz, Austria
Horst Bischof
University of Freiburg, Freiburg im Breisgau, Germany
Thomas Brox
University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
Jan-Michael Frahm

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 188 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Cheng, S. et al. (2020). Improving 3D Object Detection Through Progressive Population Based Augmentation. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, JM. (eds) Computer Vision – ECCV 2020. ECCV 2020. Lecture Notes in Computer Science(), vol 12366. Springer, Cham. https://doi.org/10.1007/978-3-030-58589-1_17

Download citation

DOI: https://doi.org/10.1007/978-3-030-58589-1_17
Published: 12 November 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-58588-4
Online ISBN: 978-3-030-58589-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics