Advertisement

PIoU Loss: Towards Accurate Oriented Object Detection in Complex Environments

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 12350)

Abstract

Object detection using an oriented bounding box (OBB) can better target rotated objects by reducing the overlap with background areas. Existing OBB approaches are mostly built on horizontal bounding box detectors by introducing an additional angle dimension optimized by a distance loss. However, as the distance loss only minimizes the angle error of the OBB and that it loosely correlates to the IoU, it is insensitive to objects with high aspect ratios. Therefore, a novel loss, Pixels-IoU (PIoU) Loss, is formulated to exploit both the angle and IoU for accurate OBB regression. The PIoU loss is derived from IoU metric with a pixel-wise form, which is simple and suitable for both horizontal and oriented bounding box. To demonstrate its effectiveness, we evaluate the PIoU loss on both anchor-based and anchor-free frameworks. The experimental results show that PIoU loss can dramatically improve the performance of OBB detectors, particularly on objects with high aspect ratios and complex backgrounds. Besides, previous evaluation datasets did not include scenarios where the objects have high aspect ratios, hence a new dataset, Retail50K, is introduced to encourage the community to adapt OBB detectors for more complex environments.

Keywords

Orientated object detection IoU loss 

Notes

Acknowledgements

The paper is supported in part by the following grants: China Major Project for New Generation of AI Grant (No. 2018AAA0100400), National Natural Science Foundation of China (No. 61971277). The work is also supported by funding from Clobotics under the Joint Research Program of Smart Retail.

References

  1. 1.
    Benedek, C., Descombes, X., Zerubia, J.: Building development monitoring in multitemporal remotely sensed image pairs with stochastic birth-death dynamics. IEEE Trans. Pattern Anal. Mach. Intell. 34(1), 33–50 (2012)CrossRefGoogle Scholar
  2. 2.
    Cannon, A.J.: Quantile regression neural networks: implementation in R and application to precipitation downscaling. Comput. Geosci. 37, 1277–1284 (2011)CrossRefGoogle Scholar
  3. 3.
    Chen, B., Tsotsos, J.K.: Fast visual object tracking with rotated bounding boxes. In: IEEE International Conference on Computer Vision, pp. 1–9 (2019)Google Scholar
  4. 4.
    Dai, J., Li, Y., He, K., Sun, J.: R-FCN: Object detection via region-based fully convolutional networks. In: Advances in Neural Information Processing Systems, pp. 379–387 (2016)Google Scholar
  5. 5.
    Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: Imagenet: a large-scale hierarchical image database. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 248–255 (2009)Google Scholar
  6. 6.
    Ding, J., Xue, N., Long, Y., Xia, G.S., Lu, Q.: Learning ROI transformer for detecting oriented objects in aerial images. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–9 (2019)Google Scholar
  7. 7.
    Everingham, M., Eslami, S.A., Van Gool, L., Williams, C.K., Winn, J., Zisserman, A.: The pascal visual object classes challenge: a retrospective. Int. J. Comput. Vis. 111(1), 98–136 (2015)CrossRefGoogle Scholar
  8. 8.
    Geiger, A., Lenz, P., Stiller, C., Urtasun, R.: Vision meets robotics: the kitti dataset. Int. J. Robot. Res. 32(11), 1231–1237 (2013)CrossRefGoogle Scholar
  9. 9.
    Goldman, E., Herzig, R., Eisenschtat, A., Goldberger, J., Hassner, T.: Precise detection in densely packed scenes. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–9 (2019)Google Scholar
  10. 10.
    He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask R-CNN. In: IEEE International Conference on Computer Vision, pp. 2980–2988 (2017)Google Scholar
  11. 11.
    He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)Google Scholar
  12. 12.
    He, S., Lau, R.W.: Oriented object proposals. In: IEEE International Conference on Computer Vision, pp. 280–288 (2015)Google Scholar
  13. 13.
    Huber, P.J.: Robust estimation of a location parameter. Ann. Stat. 53, 73–101 (1964)MathSciNetCrossRefGoogle Scholar
  14. 14.
    Jiang, Y., et al.: R2CNN: rotational region CNN for orientation robust scene text detection. arXiv:1706.09579 (2017)
  15. 15.
    Karatzas, D., et al.: ICDAR competition on robust reading. In: International Conference on Document Analysis and Recognition, pp. 1156–1160 (2015)Google Scholar
  16. 16.
    Lang, A.H., Vora, S., Caesar, H., Zhou, L., Yang, J., Beijbom, O.: Pointpillars: fast encoders for object detection from point clouds. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 12697–12705 (2019)Google Scholar
  17. 17.
    Law, H., Deng, J.: CornerNet: detecting objects as paired keypoints. In: European Conference on Computer Vision, pp. 734–750 (2018)Google Scholar
  18. 18.
    Li, S., Zhang, Z., Li, B., Li, C.: Multiscale rotated bounding box-based deep learning method for detecting ship targets in remote sensing images. Sensors 18(8), 1–14 (2018)CrossRefGoogle Scholar
  19. 19.
    Liao, M., Zhu, Z., Shi, B., Xia, G.S., Bai, X.: Rotation-sensitive regression for oriented scene text detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 5909–5918 (2018)Google Scholar
  20. 20.
    Lin, T.Y., Dollár, P., Girshick, R., He, K., Hariharan, B., Belongie, S.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 2117–2125 (2017)Google Scholar
  21. 21.
    Lin, T.Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, pp. 2980–2988 (2017)Google Scholar
  22. 22.
    Lin, T.Y., et al.: Microsoft COCO: common objects in context. In: European Conference on Computer Vision, pp. 740–755 (2014)Google Scholar
  23. 23.
    Liu, K., Mattyus, G.: Fast multiclass vehicle detection on aerial images. IEEE Geosci. Remote Sens. Lett. 12(9), 1938–1942 (2015)CrossRefGoogle Scholar
  24. 24.
    Liu, L., Pan, Z., Lei, B.: Learning a rotation invariant detector with rotatable bounding box. arXiv:1711.09405 (2017)
  25. 25.
    Liu, W., et al.: SSD: single shot multibox detector. In: European Conference on Computer Vision, pp. 21–37 (2016)Google Scholar
  26. 26.
    Liu, Z., Wang, H., Weng, L., Yang, Y.: Ship rotated bounding box space for ship extraction from high-resolution optical satellite images with complex backgrounds. IEEE Geosci. Remote Sens. Lett. 13(8), 1074–1078 (2016)CrossRefGoogle Scholar
  27. 27.
    Liu, Z., Yuan, L., Weng, L., Yang, Y.: A high resolution optical satellite image dataset for ship recognition and some new baselines. In: Proceedings of the 6th International Conference on Pattern Recognition Applications and Methods - Volume 2 (ICPRAM), pp. 324–331 (2017)Google Scholar
  28. 28.
    Ma, J., et al.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Trans. Multimedia 20, 3111–3122 (2018)CrossRefGoogle Scholar
  29. 29.
    Introduction to the Theory of Statistics, p. 229. McGraw-Hill, New York (1974)Google Scholar
  30. 30.
    Muller, R.R., Gerstacker, W.H.: On the capacity loss due to separation of detection and decoding. IEEE Trans. Inf. Theor. 50, 1769–1778 (2004)MathSciNetCrossRefGoogle Scholar
  31. 31.
    Razakarivony, S., Jurie, F.: Vehicle detection in aerial imagery : a small target detection benchmark. J. Vis. Commun. Image Represent. 34(1), 187–203 (2016)CrossRefGoogle Scholar
  32. 32.
    Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You Only Look Once: Uified, Real-time Object Detection, pp. 779–788 (2016)Google Scholar
  33. 33.
    Redmon, J., Farhadi, A.: YOLO9000: better, faster, stronger. arXiv:1612.08242 (2016)
  34. 34.
    Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: Towards real-time object detection with region proposal networks. In: Advances in Neural Information Processing Systems, pp. 91–99 (2015)Google Scholar
  35. 35.
    Rezatofighi, H., et al.: Generalized intersection over union: a metric and a loss for bounding box regression. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 658–666 (2019)Google Scholar
  36. 36.
    Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556 (2014)
  37. 37.
    Tychsen-Smith, L., Petersson, L.: Lars petersson: Improving object localization with fitness NMS and bounded IoU loss. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6877–6887 (2018)Google Scholar
  38. 38.
    Willmott, C.J., Matsuura, K.: Advantages of the mean absolute error (MAE) over the root mean square error (RMSE) in assessing average model performance. Clim. Res. 30, 79–82 (2005)CrossRefGoogle Scholar
  39. 39.
    Xia, G.S., et al.: DOTA: a large-scale dataset for object detection in aerial images. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 3974–3983 (2018)Google Scholar
  40. 40.
    Xie, T., Liu, F., Feng, D.: Fast collision attack on MD5. IACR Cryptol. ePrint Arch. 2013, 170 (2013)Google Scholar
  41. 41.
    Yang, X., Liu, Q., Yan, J., Li, A.: R3DET: refined single-stage detector with feature refinement for rotating object. arXiv:1908.05612 (2019)
  42. 42.
    Yang, X., et al.: Automatic ship detection in remote sensing images from google earth of complex scenes based on multiscale rotation dense feature pyramid networks. Remote Sens. 10(1), 132 (2018)CrossRefGoogle Scholar
  43. 43.
    Yang, X., et al.: SCRDet: Towards more robust detection for small, cluttered and rotated objects. In: IEEE International Conference on Computer Vision, pp. 1–9 (2019)Google Scholar
  44. 44.
    Yu, F., Wang, D., Shelhamer, E., Darrell, T.: Deep layer aggregation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 2403–2412 (2018)Google Scholar
  45. 45.
    Yu, J., Jiang, Y., Wang, Z., Cao, Z., Huang, T.: Unitbox: an advanced object detection network. ACM Int. Conf. Multimedia, 516–520 (2016)Google Scholar
  46. 46.
    Zhang, S., Wen, L., Bian, X., Lei, Z., Li, S.Z.: Single-shot refinement neural network for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 4203–4212 (2018)Google Scholar
  47. 47.
    Zhang, Z., Guo, W., Zhu, S., Yu, W.: Toward arbitrary-oriented ship detection with rotated region proposal and discrimination networks. IEEE Geosci. Remote Sens. Lett. 15(11), 1745–1749 (2018)CrossRefGoogle Scholar
  48. 48.
    Zhou, D., et al.: IoU loss for 2D/3D object detection. In: IEEE International Conference on 3D Vision, pp. 1–10 (2019)Google Scholar
  49. 49.
    Zhou, X., Wang, D., Krähenbühl, P.: Objects as points. arXiv:1904.07850 (2019)
  50. 50.
    Zhu, H., Chen, X., Dai, W., Fu, K., Ye, Q., Jiao, J.: Orientation robust object detection in aerial images using deep convolutional neural network. In: IEEE International Conference on Image Processing, pp. 3735–3739 (2015)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  1. 1.CloboticsShanghaiChina
  2. 2.Department of Electronic EngineeringShanghai Jiao Tong UniversityShanghaiChina
  3. 3.Faculty of Computing and InformaticsMultimedia UniversityCyberjayaMalaysia

Personalised recommendations