Rich Features and Precise Localization with Region Proposal Network for Object Detection

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10568)

Abstract

Deep Network greatly accelerates the development of object detection. Recent advances in object detection are mainly attributed to the combination of deep network and region proposal methods [1, 2, 3]. However, the accuracy of object detection on the complicated datasets is still not satisfied, especially on small object detection. This is mainly because of the coarseness of the convolution feature maps. In this paper, we design a new strategy for generating region proposals and propose a new localization method for object detection. Compared with previous baseline detectors such as Fast R-CNN [4] and Faster R-CNN [5], Our method makes use of the adjacent-level feature maps at all scales to generate region proposals and also adopts the cascaded region proposal network (RPN) to fine-tune the location of the bounding box. Compared with other state-of-the-art methods, our method achieves the best recall and object detection accuracy.

Keywords

Object detection Proposal Features Localization Cascaded 

References

  1. 1.
    Uijlings, J.R., van de Sande, K.E., Gevers, T., Smeulders, A.W.: Selective search for object recognition. Int. J. Comput. Vis. 104, 154–171 (2013)CrossRefGoogle Scholar
  2. 2.
    He, K., Zhang, X., Ren, S., Sun, J.: Spatial pyramid pooling in deep convolutional networks for visual recognition. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8691, pp. 346–361. Springer, Cham (2014). doi:10.1007/978-3-319-10578-9_23 Google Scholar
  3. 3.
    Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: unified, real-time object detection. arXiv preprint arXiv:1506.02640 (2015)
  4. 4.
    Girshick, R.: Fast R-CNN. In: ICCV (2015)Google Scholar
  5. 5.
    Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: NIPS (2015)Google Scholar
  6. 6.
    Hosang, J., Benenson, R., Doll´ar, P., Schiele, B.: What makes for effective detection proposals? IEEE Trans. Pattern Anal. Mach. Intell. 38, 814–830 (2015)CrossRefGoogle Scholar
  7. 7.
    Szegedy, C., Reed, S., Erhan, D., Anguelov, D.: Scalable, high-quality object detection. arXiv:1412.1441 (v1) (2015)
  8. 8.
    Gidaris, S., Komodakis, N.: Object detection via a multi-region and semantic segmentation-aware CNN model. In: ICCV (2015)Google Scholar
  9. 9.
    Hariharan, B., Arbel´aez, P., Girshick, R., Malik, J.: Hyper columns for object segmentation and fine-grained localization. In: CVPR (2015)Google Scholar
  10. 10.
    Viola, P., Jones, M.J.: Robust real-time face detection. IJCV 57, 137–154 (2004)CrossRefGoogle Scholar
  11. 11.
    Liu, M.-Y., Mallya, A., Tuzel, O., Chen, X.: Unsupervised network pretraining via encoding human design. In: 2016 IEEE Winter Conference on Applications of Computer Vision. pp. 1–9. IEEE (2016)Google Scholar
  12. 12.
    Sermanet, P., Eigen, D., Zhang, X., Mathieu, M., Fergus, R., LeCun, Y.: Overfeat: integrated recognition, localization and detection using convolutional networks. In: ICLR (2014)Google Scholar
  13. 13.
    Ghodrati, A., Pedersoli, M., Tuytelaars, T., Diba, A., Gool, L.V.: Deep proposal: hunting objects by cascading deep convolutional layers. In: ICCV (2015)Google Scholar
  14. 14.
    Hua, Y., Alahari, K., Schmid, C.: Online object tracking with proposal selection. In: ICCV (2015)Google Scholar
  15. 15.
    Jia, Y., Han, M.: Category-independent object-level saliency detection. In: ICCV (2013)Google Scholar
  16. 16.
    Guo, K., Wu, S., Xu, Y.: Face recognition using both visible light image and near-infrared image and a deep network. Caai Trans. Intell. Technol. 2(1), 39–47 (2017)CrossRefGoogle Scholar
  17. 17.
    Cai, Z., Fan, Q., Feris, R.S., Vasconcelos, N.: A unified multi-scale deep convolutional neural network for fast object detection. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9908, pp. 354–370. Springer, Cham (2016). doi:10.1007/978-3-319-46493-0_22 CrossRefGoogle Scholar
  18. 18.
    Bell, S., Zitnick, C.L, Bala, K, et al.: Inside-outside net: detecting objects in context with skip pooling and recurrent neural networks. pp. 2874–2883 (2016)Google Scholar
  19. 19.
    Lin, T.Y, et al.: Feature pyramid networks for object detection (2016)Google Scholar
  20. 20.
    Xu, Y., Zhang, B., Zhong, Z.: Multiple representations and sparse representation for image classification. Pattern Recogn. Lett. 68, 9–14 (2015)CrossRefGoogle Scholar
  21. 21.
    Dai, J., Li, Y., He, K., et al.: R-FCN: object detection via region-based fully convolutional networks (2016)Google Scholar
  22. 22.
    Dollár, P., Appel, R., Belongie, S., Perona, P.: Fast feature pyramids for object detection. PAMI 36(8), 1532–1545 (2014)CrossRefGoogle Scholar
  23. 23.
    Yang, B., Yan, J., Lei, Z., et al.: Craft objects from images. pp. 6043–6051 (2016)Google Scholar
  24. 24.
    Everingham, M., Eslami, S.M.A., Van Gool, L., Williams, C.K.I., Winn, J., Zisserman, A.: The pascal visual object classes challenge: a retrospective. IJCV 111, 98–136 (2015)CrossRefGoogle Scholar
  25. 25.
    Lin, G., Milan, A., Shen, C., et al.: Refine net: multi-path refinement networks for high-resolution semantic segmentation (2016)Google Scholar
  26. 26.
    Ghodrati, A., Pedersoli, M., Tuytelaars, T., Diba, A., Van Gool, L.: Deep boxes: hunting objects by cascading deep convolutional layers. In: Proceedings ICCV (2015)Google Scholar
  27. 27.
    He, K., Zhang, X., Ren, S., Sun, J.: Spatial pyramid pooling in deep convolutional networks for visual recognition. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8691, pp. 346–361. Springer, Cham (2014). doi:10.1007/978-3-319-10578-9_23 Google Scholar
  28. 28.
    Shrivastava, A., Gupta, A., Girshick, R.: Training region based object detectors with online hard example mining. In: CVPR (2016)Google Scholar
  29. 29.
    Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. Comput. Sci. (2015)Google Scholar
  30. 30.
    He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR (2016)Google Scholar
  31. 31.
    Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M., Berg, A.C., Fei-Fei, L.: Imagenet large scale visual recognition challenge. IJCV 115, 211–252 (2015)CrossRefMathSciNetGoogle Scholar
  32. 32.
    Kong, T., Yao, A., Chen, Y., et al.: HyperNet: towards accurate region proposal generation and joint object detection. pp. 845–853 (2016)Google Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  1. 1.Computer Science and TechnologyHarbin Institute of TechnologyHarbinChina

Personalised recommendations