Rotation-Robust Intersection over Union for 3D Object Detection

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 12365)


In this paper, we propose a Rotation-robust Intersection over Union (\(\textit{RIoU}\)) for 3D object detection, which aims to learn the overlap of rotated bounding boxes. In most existing 3D object detection methods, the norm-based loss is adopted to individually regress the parameters of bounding boxes, which may suffer from the loss-metric mismatch due to the scaling problem. Motivated by the IoU loss in the axis-aligned 2D object detection which is invariant to the scale, our method jointly optimizes the parameters via the \(\textit{RIoU}\) loss. To tackle the uncertainty of convex caused by rotation, a projection operation is defined to estimate the intersection area. The calculation process of \(\textit{RIoU}\) and its loss function is robust to the rotation condition and feasible for back-propagation, which only comprises basic numerical operations. By incorporating the \(\textit{RIoU}\) loss with the conventional norm-based loss function, we enforce the network to directly optimize the \(\textit{RIoU}\). Experimental results on the KITTI, nuScenes and SUN RGB-D datasets validate the effectiveness of our proposed method. Moreover, we show that our method is suitable for the detection task of 2D rotated objects, such as text boxes and cluttered targets in the aerial images.


3D object detection Loss function Rotation-robust 



This work was supported in part by the National Key Research and Development Program of China under Grant 2017YFA0700802, in part by the National Natural Science Foundation of China under Grant 61822603, Grant U1813218, Grant U1713214, and Grant 61672306, in part by Beijing Academy of Artificial Intelligence (BAAI), in part by a grant from the Institute for Guo Qiang, Tsinghua University, in part by the Shenzhen Fundamental Research Fund (Subject Arrangement) under Grant JCYJ20170412170602564, and in part by Tsinghua University Initiative Scientific Research Program.

Supplementary material

504476_1_En_28_MOESM1_ESM.pdf (1.2 mb)
Supplementary material 1 (pdf 1182 KB)


  1. 1.
    Caesar, H., et al.: nuScenes: a multimodal dataset for autonomous driving. arXiv preprint arXiv:1903.11027 (2019)
  2. 2.
    Chabot, F., Chaouch, M., Rabarisoa, J., Teuliere, C., Chateau, T.: Deep Manta: a coarse-to-fine many-task network for joint 2D and 3D vehicle analysis from monocular image. In: CVPR, pp. 2040–2049 (2017)Google Scholar
  3. 3.
    Chen, L.C., Papandreou, G., Kokkinos, I., Murphy, K., Yuille, A.L.: DeepLab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. TPAMI 40(4), 834–848 (2017)CrossRefGoogle Scholar
  4. 4.
    Chen, X., Kundu, K., Zhang, Z., Ma, H., Fidler, S., Urtasun, R.: Monocular 3D object detection for autonomous driving. In: CVPR, pp. 2147–2156 (2016)Google Scholar
  5. 5.
    Chen, X., Kundu, K., Zhu, Y., Berneshawi, A.G., Ma, H., Fidler, S., Urtasun, R.: 3D object proposals for accurate object class detection. In: NIPS, pp. 424–432 (2015)Google Scholar
  6. 6.
    Chen, X., Kundu, K., Zhu, Y., Ma, H., Fidler, S., Urtasun, R.: 3D object proposals using stereo imagery for accurate object class detection. TPAMI 40(5), 1259–1272 (2017)CrossRefGoogle Scholar
  7. 7.
    Chen, X., Ma, H., Wan, J., Li, B., Xia, T.: Multi-view 3D object detection network for autonomous driving. In: CVPR, pp. 1907–1915 (2017)Google Scholar
  8. 8.
    Chen, Y., Liu, S., Shen, X., Jia, J.: Fast point R-CNN. In: ICCV, pp. 9775–9784 (2019)Google Scholar
  9. 9.
    Engelcke, M., Rao, D., Wang, D.Z., Tong, C.H., Posner, I.: Vote3Deep: fast object detection in 3D point clouds using efficient convolutional neural networks. In: ICRA, pp. 1355–1361 (2017)Google Scholar
  10. 10.
    Geiger, A., Lenz, P., Urtasun, R.: Are we ready for autonomous driving? The KITTI vision benchmark suite. In: CVPR, pp. 3354–3361 (2012)Google Scholar
  11. 11.
    Girshick, R.: Fast R-CNN. In: ICCV, pp. 1440–1448 (2015)Google Scholar
  12. 12.
    He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016)Google Scholar
  13. 13.
    Huang, C., Zhai, S., Talbott, W., Bautista, M.A., Sun, S.Y., Guestrin, C., Susskind, J.: Addressing the loss-metric mismatch with adaptive loss alignment. In: ICML (2019)Google Scholar
  14. 14.
    Janssens, R., Zeng, G., Zheng, G.: Fully automatic segmentation of lumbar vertebrae from CT images using cascaded 3D fully convolutional networks. In: ISBI, pp. 893–897 (2018)Google Scholar
  15. 15.
    Jiang, B., Luo, R., Mao, J., Xiao, T., Jiang, Y.: Acquisition of localization confidence for accurate object detection. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) Computer Vision – ECCV 2018. LNCS, vol. 11218, pp. 816–832. Springer, Cham (2018). CrossRefGoogle Scholar
  16. 16.
    Karatzas, D., et al.: ICDAR 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015)Google Scholar
  17. 17.
    Karatzas, D., et al.: ICDAR 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013)Google Scholar
  18. 18.
    Kosiorek, A., Bewley, A., Posner, I.: Hierarchical attentive recurrent tracking. In: NIPS, pp. 3053–3061 (2017)Google Scholar
  19. 19.
    Ku, J., Mozifian, M., Lee, J., Harakeh, A., Waslander, S.L.: Joint 3D proposal generation and object detection from view aggregation. In: IROS, pp. 1–8 (2018)Google Scholar
  20. 20.
    Ku, J., Pon, A.D., Waslander, S.L.: Monocular 3D object detection leveraging accurate proposals and shape reconstruction. In: CVPR, pp. 11867–11876 (2019)Google Scholar
  21. 21.
    Lang, A.H., Vora, S., Caesar, H., Zhou, L., Yang, J., Beijbom, O.: PointPillars: fast encoders for object detection from point clouds. In: CVPR, pp. 12697–12705 (2019)Google Scholar
  22. 22.
    Li, B.: 3D fully convolutional network for vehicle detection in point cloud. In: IROS, pp. 1513–1518 (2017)Google Scholar
  23. 23.
    Li, P., Chen, X., Shen, S.: Stereo R-CNN based 3D object detection for autonomous driving. In: CVPR, pp. 7644–7652 (2019)Google Scholar
  24. 24.
    Li, P., Qin, T., et al.: Stereo vision-based semantic 3D object and ego-motion tracking for autonomous driving. In: ECCV, pp. 646–661 (2018)Google Scholar
  25. 25.
    Lin, T.Y., Dollár, P., Girshick, R., He, K., Hariharan, B., Belongie, S.: Feature pyramid networks for object detection. In: CVPR, pp. 2117–2125 (2017)Google Scholar
  26. 26.
    Lin, T.Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017)Google Scholar
  27. 27.
    Liu, L., Pan, Z., Lei, B.: Learning a rotation invariant detector with rotatable bounding box. arXiv preprint arXiv:1711.09405 (2017)
  28. 28.
    Liu, L., Lu, J., Xu, C., Tian, Q., Zhou, J.: Deep fitting degree scoring network for monocular 3D object detection. In: CVPR, pp. 1057–1066 (2019)Google Scholar
  29. 29.
    Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: FOTS: fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018)Google Scholar
  30. 30.
    Maturana, D., Scherer, S.: VoxNet: a 3D convolutional neural network for real-time object recognition. In: IROS, pp. 922–928 (2015)Google Scholar
  31. 31.
    Nam, H., Han, B.: Learning multi-domain convolutional neural networks for visual tracking. In: CVPR, pp. 4293–4302 (2016)Google Scholar
  32. 32.
    Qi, C.R., Litany, O., He, K., Guibas, L.J.: Deep hough voting for 3D object detection in point clouds. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 9277–9286 (2019)Google Scholar
  33. 33.
    Qi, C.R., Liu, W., Wu, C., Su, H., Guibas, L.J.: Frustum PointNets for 3D object detection from RGB-D data. In: CVPR, pp. 918–927 (2018)Google Scholar
  34. 34.
    Qi, C.R., Su, H., Mo, K., Guibas, L.J.: PointNet: deep learning on point sets for 3D classification and segmentation. In: CVPR, pp. 652–660 (2017)Google Scholar
  35. 35.
    Qi, C.R., Yi, L., Su, H., Guibas, L.J.: PointNet++: deep hierarchical feature learning on point sets in a metric space. In: NIPS, pp. 5099–5108 (2017)Google Scholar
  36. 36.
    Qin, Z., Wang, J., Lu, Y.: Triangulation learning network: from monocular to stereo 3D object detection. In: CVPR, pp. 11867–11876 (2019)Google Scholar
  37. 37.
    Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: NIPS, pp. 91–99 (2015)Google Scholar
  38. 38.
    Rezatofighi, H., Tsoi, N., Gwak, J., Sadeghian, A., Reid, I., Savarese, S.: Generalized intersection over union: a metric and a loss for bounding box regression. In: CVPR, pp. 658–666 (2019)Google Scholar
  39. 39.
    Shi, S., Wang, X., Li, H.: PointRCNN: 3D object proposal generation and detection from point cloud. In: CVPR, pp. 770–779 (2019)Google Scholar
  40. 40.
    Song, S., Lichtenberg, S.P., Xiao, J.: Sun RGB-D: a RGB-D scene understanding benchmark suite. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 567–576 (2015)Google Scholar
  41. 41.
    Tychsen-Smith, L., Petersson, L.: Improving object localization with fitness NMS and bounded IoU loss. In: CVPR, pp. 6877–6885 (2018)Google Scholar
  42. 42.
    Wang, X., Xiao, T., Jiang, Y., Shao, S., Sun, J., Shen, C.: Repulsion loss: detecting pedestrians in a crowd. In: CVPR, pp. 7774–7783 (2018)Google Scholar
  43. 43.
    Wang, Z., Jia, K.: Frustum convNet: sliding frustums to aggregate local point-wise features for a modal 3D object detection. arXiv preprint arXiv:1903.01864 (2019)
  44. 44.
    Xia, G.S., et al.: DOTA: a large-scale dataset for object detection in aerial images. In: CVPR, pp. 3974–3983 (2018)Google Scholar
  45. 45.
    Xu, B., Chen, Z.: Multi-level fusion based 3D object detection from monocular images. In: CVPR, pp. 2345–2353 (2018)Google Scholar
  46. 46.
    Xu, D., Anguelov, D., Jain, A.: PointFusion: deep sensor fusion for 3D bounding box estimation. In: CVPR, pp. 244–253 (2018)Google Scholar
  47. 47.
    Yan, Y., Mao, Y., Li, B.: Second: sparsely embedded convolutional detection. Sensors 18(10), 3337 (2018)CrossRefGoogle Scholar
  48. 48.
    Yang, X., Liu, Q., Yan, J., Li, A.: R3Det: refined single-stage detector with feature refinement for rotating object. arXiv preprint arXiv:1908.05612 (2019)
  49. 49.
    Yang, X., et al.: SCRDet: towards more robust detection for small, cluttered and rotated objects. In: ICCV, pp. 8232–8241 (2019)Google Scholar
  50. 50.
    Yu, J., Jiang, Y., Wang, Z., Cao, Z., Huang, T.: UnitBox: an advanced object detection network. In: ACM MM, pp. 516–520 (2016)Google Scholar
  51. 51.
    Zhou, D., et al.: IoU loss for 2D/3D object detection. In: 3DV, pp. 85–94 (2019)Google Scholar
  52. 52.
    Zhou, J., Lu, X., Tan, X., Shao, Z., Ding, S., Ma, L.: FVNet: 3D front-view proposal generation for real-time object detection from point clouds. arXiv preprint arXiv:1903.10750 (2019)
  53. 53.
    Zhou, X., et al.: East: an efficient and accurate scene text detector. In: CVPR, pp. 5551–5560 (2017)Google Scholar
  54. 54.
    Zhou, Y., Tuzel, O.: VoxelNet: end-to-end learning for point cloud based 3D object detection. In: CVPR, pp. 4490–4499 (2018)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  1. 1.Department of AutomationTsinghua UniversityBeijingChina
  2. 2.State Key Lab of Intelligent Technologies and SystemsBeijingChina
  3. 3.Beijing National Research Center for Information Science and TechnologyBeijingChina
  4. 4.Tsinghua Shenzhen International Graduate SchoolTsinghua UniversityBeijingChina

Personalised recommendations