Advertisement

An LSTM Approach to Temporal 3D Object Detection in LiDAR Point Clouds

Conference paper
  • 566 Downloads
Part of the Lecture Notes in Computer Science book series (LNCS, volume 12363)

Abstract

Detecting objects in 3D LiDAR data is a core technology for autonomous driving and other robotics applications. Although LiDAR data is acquired over time, most of the 3D object detection algorithms propose object bounding boxes independently for each frame and neglect the useful information available in the temporal domain. To address this problem, in this paper we propose a sparse LSTM-based multi-frame 3d object detection algorithm. We use a U-Net style 3D sparse convolution network to extract features for each frame’s LiDAR point-cloud. These features are fed to the LSTM module together with the hidden and memory features from last frame to predict the 3d objects in the current frame as well as hidden and memory features that are passed to the next frame. Experiments on the Waymo Open Dataset show that our algorithm outperforms the traditional frame by frame approach by 7.5% mAP@0.7 and other multi-frame approaches by 1.2% while using less memory and computation per frame. To the best of our knowledge, this is the first work to use an LSTM for 3D object detection in sparse point clouds.

Keywords

3D object detection LSTM Point cloud 

References

  1. 1.
    Behl, A., Hosseini Jafari, O., Karthik Mustikovela, S., Abu Alhaija, H., Rother, C., Geiger, A.: Bounding boxes, segmentations and object coordinates: how important is recognition for 3D scene flow estimation in autonomous driving scenarios? In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2574–2583 (2017)Google Scholar
  2. 2.
    Behl, A., Paschalidou, D., Donné, S., Geiger, A.: PointFlowNet: learning representations for rigid motion estimation from point clouds. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7962–7971 (2019)Google Scholar
  3. 3.
    Caesar, H., et al.: nuScenes: a multimodal dataset for autonomous driving. arXiv preprint arXiv:1903.11027 (2019)
  4. 4.
    Casas, S., Luo, W., Urtasun, R.: IntentNet: learning to predict intention from raw sensor data. In: Conference on Robot Learning, pp. 947–956 (2018)Google Scholar
  5. 5.
    Chang, M.F., et al.: Argoverse: 3D tracking and forecasting with rich maps. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 8748–8757 (2019)Google Scholar
  6. 6.
    Chen, X., Ma, H., Wan, J., Li, B., Xia, T.: Multi-view 3D object detection network for autonomous driving. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1907–1915 (2017)Google Scholar
  7. 7.
    Chen, X., Yu, J., Wu, Z.: Temporally identity-aware SSD with attentional LSTM. IEEE Trans. Cybern. 50(6), 2674–2686 (2019)CrossRefGoogle Scholar
  8. 8.
    Chiu, H.k., Prioletti, A., Li, J., Bohg, J.: Probabilistic 3D multi-object tracking for autonomous driving. arXiv preprint arXiv:2001.05673 (2020)
  9. 9.
    Choy, C., Gwak, J., Savarese, S.: 4D spatio-temporal convnets: Minkowski convolutional neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3075–3084 (2019)Google Scholar
  10. 10.
    Dosovitskiy, A., et al.: FlowNet: learning optical flow with convolutional networks. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2758–2766 (2015)Google Scholar
  11. 11.
    El Sallab, A., Sobh, I., Zidan, M., Zahran, M., Abdelkarim, S.: YOLO4D: a spatio-temporal approach for real-time multi-object detection and classification from LiDAR point clouds. In: NIPS 2018 Workshop MLITS (2018)Google Scholar
  12. 12.
    Engelcke, M., Rao, D., Wang, D.Z., Tong, C.H., Posner, I.: Vote3Deep: fast object detection in 3D point clouds using efficient convolutional neural networks. In: 2017 IEEE International Conference on Robotics and Automation (ICRA), pp. 1355–1361. IEEE (2017)Google Scholar
  13. 13.
    Fan, H., Yang, Y.: PointRNN: point recurrent neural network for moving point cloud processing. arXiv preprint arXiv:1910.08287 (2019)
  14. 14.
    Feng, Y., Ma, L., Liu, W., Luo, J.: Spatio-temporal video re-localization by warp LSTM. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1288–1297 (2019)Google Scholar
  15. 15.
    Geiger, A., Lenz, P., Urtasun, R.: Are we ready for autonomous driving? the KITTI vision benchmark suite. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition, pp. 3354–3361. IEEE (2012)Google Scholar
  16. 16.
    Graham, B.: Sparse 3D convolutional neural networks. In: Xie, X., Jones, M.W.G.K.L.T. (eds.) Proceedings of the British Machine Vision Conference (BMVC), pp. 150.1–150.9. BMVA Press, September 2015.  https://doi.org/10.5244/C.29.150
  17. 17.
    Graham, B., Engelcke, M., van der Maaten, L.: 3D semantic segmentation with submanifold sparse convolutional networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 9224–9232 (2018)Google Scholar
  18. 18.
    Graham, B., van der Maaten, L.: Submanifold sparse convolutional networks. arXiv preprint arXiv:1706.01307 (2017)
  19. 19.
    Groueix, T., Fisher, M., Kim, V.G., Russell, B.C., Aubry, M.: AtlasNet: a papier-mâché approach to learning 3D surface generation. arXiv preprint arXiv:1802.05384 (2018)
  20. 20.
    Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)CrossRefGoogle Scholar
  21. 21.
    Hu, P., Ziglar, J., Held, D., Ramanan, D.: What you see is what you get: exploiting visibility for 3D object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11001–11009 (2020)Google Scholar
  22. 22.
    Huang, L., Yan, P., Li, G., Wang, Q., Lin, L.: Attention embedded spatio-temporal network for video salient object detection. IEEE Access 7, 166203–166213 (2019)CrossRefGoogle Scholar
  23. 23.
    Kanazawa, A., Zhang, J.Y., Felsen, P., Malik, J.: Learning 3D human dynamics from video. In: Computer Vision and Pattern Regognition (CVPR) (2019)Google Scholar
  24. 24.
    Kang, K., et al.: Object detection in videos with tubelet proposal networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 727–735 (2017)Google Scholar
  25. 25.
    Lang, A.H., Vora, S., Caesar, H., Zhou, L., Yang, J., Beijbom, O.: PointPillars: fast encoders for object detection from point clouds. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 12697–12705 (2019)Google Scholar
  26. 26.
    Li, B., Zhang, T., Xia, T.: Vehicle detection from 3d lidar using fully convolutional network. arXiv preprint arXiv:1608.07916 (2016)
  27. 27.
    Liang, M., Yang, B., Wang, S., Urtasun, R.: Deep continuous fusion for multi-sensor 3D object detection. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 641–656 (2018)Google Scholar
  28. 28.
    Liu, X., Qi, C.R., Guibas, L.J.: FlowNet3D: learning scene flow in 3D point clouds. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 529–537 (2019)Google Scholar
  29. 29.
    Liu, X., Yan, M., Bohg, J.: MeteorNet: deep learning on dynamic 3D point cloud sequences. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 9246–9255 (2019)Google Scholar
  30. 30.
    Luo, W., Yang, B., Urtasun, R.: Fast and furious: real time end-to-end 3D detection, tracking and motion forecasting with a single convolutional net. In: Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, pp. 3569–3577 (2018)Google Scholar
  31. 31.
    McCrae, S., Zakhor, A.: 3D object detection using temporal lidar data. In: IEEE International Conference on Image Processing (ICIP) (2020)Google Scholar
  32. 32.
    Meyer, G.P., Laddha, A., Kee, E., Vallespi-Gonzalez, C., Wellington, C.K.: LaserNet: an efficient probabilistic 3D object detector for autonomous driving. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 12677–12686 (2019)Google Scholar
  33. 33.
    Najibi, M., et al.: DOPS: learning to detect 3D objects and predict their 3D shapes. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2020)Google Scholar
  34. 34.
    Ngiam, J., et al.: StarNet: targeted computation for object detection in point clouds. arXiv preprint arXiv:1908.11069 (2019)
  35. 35.
    Qi, C.R., Litany, O., He, K., Guibas, L.J.: Deep Hough voting for 3D object detection in point clouds. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 9277–9286 (2019)Google Scholar
  36. 36.
    Qi, C.R., Liu, W., Wu, C., Su, H., Guibas, L.J.: Frustum pointnets for 3D object detection from RGB-D data. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 918–927 (2018)Google Scholar
  37. 37.
    Qi, C.R., Yi, L., Su, H., Guibas, L.J.: PointNet++: deep hierarchical feature learning on point sets in a metric space. In: Advances in Neural Information Processing Systems, pp. 5099–5108 (2017)Google Scholar
  38. 38.
    Riegler, G., Osman Ulusoy, A., Geiger, A.: OctNet: Learning deep 3D representations at high resolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3577–3586 (2017)Google Scholar
  39. 39.
    Shi, S., Wang, X., Li, H.: PointRCNN: 3D object proposal generation and detection from point cloud. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–779 (2019)Google Scholar
  40. 40.
    Simon, M., et al.: Complexer-YOLO: real-time 3D object detection and tracking on semantic point clouds. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops (2019)Google Scholar
  41. 41.
    Simon, M., Milz, S., Amende, K., Gross, H.-M.: Complex-YOLO: an Euler-region-proposal for real-time 3D object detection on point clouds. In: Leal-Taixé, L., Roth, S. (eds.) ECCV 2018. LNCS, vol. 11129, pp. 197–209. Springer, Cham (2019).  https://doi.org/10.1007/978-3-030-11009-3_11CrossRefGoogle Scholar
  42. 42.
    Sun, P., et al.: Scalability in perception for autonomous driving: Waymo open dataset. arXiv pp. arXiv-1912 (2019)Google Scholar
  43. 43.
    Teng, E., Falcão, J.D., Huang, R., Iannucci, B.: ClickBAIT: click-based accelerated incremental training of convolutional neural networks. In: 2018 IEEE Applied Imagery Pattern Recognition Workshop (AIPR), pp. 1–12. IEEE (2018)Google Scholar
  44. 44.
    Weng, X., Kitani, K.: A Baseline for 3D Multi-Object Tracking. arXiv:1907.03961 (2019)
  45. 45.
    Xiao, F., Jae Lee, Y.: Video object detection with an aligned spatial-temporal memory. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 485–501 (2018)Google Scholar
  46. 46.
    Xu, Z., et al.: Unsupervised discovery of parts, structure, and dynamics. arXiv preprint arXiv:1903.05136 (2019)
  47. 47.
    Yang, B., Luo, W., Urtasun, R.: PIXOR: real-time 3D object detection from point clouds. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7652–7660 (2018)Google Scholar
  48. 48.
    Yang, Y., Feng, C., Shen, Y., Tian, D.: FoldingNet: point cloud auto-encoder via deep grid deformation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 206–215 (2018)Google Scholar
  49. 49.
    Yin, J., Shen, J., Guan, C., Zhou, D., Yang, R.: LiDAR-based online 3D video object detection with graph-based message passing and spatiotemporal transformer attention. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2020Google Scholar
  50. 50.
    Zhao, H., Jiang, L., Fu, C.W., Jia, J.: PointWeb: enhancing local neighborhood features for point cloud processing. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5565–5573 (2019)Google Scholar
  51. 51.
    Zhao, Y., Birdal, T., Deng, H., Tombari, F.: 3D point capsule networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1009–1018 (2019)Google Scholar
  52. 52.
    Zhou, Y., et al.: End-to-end multi-view fusion for 3D object detection in LiDAR point clouds. arXiv preprint arXiv:1910.06528 (2019)
  53. 53.
    Zhou, Y., Tuzel, O.: VoxelNet: end-to-end learning for point cloud based 3D object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4490–4499 (2018)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  1. 1.Google ResearchMountain ViewUS

Personalised recommendations