Advertisement

Revisiting Faster R-CNN: A Deeper Look at Region Proposal Network

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10636)

Abstract

Currently, state-of-the-art object detectors are based on Faster R-CNN. We firstly revisit Faster R-CNN and explore problems in it, e.g., coarseness of feature maps for accurate localization, fixed-window feature extraction in RPN and insensitivity for small scale objects. Then a novel object detection network is proposed to address these problems. Specifically, we utilize a two-stage cascade multi-scale proposal generation network to get high accurate proposals: an original RPN is adopted to initially generate coarse proposals, then another network with multi-layer features and RoI pooling layer are introduced to refine these proposals. We also generate small scale proposals in the second stage simultaneously. After that, a detection network with multi-layer features further classifies and refines proposals. A novel 3-step joint training algorithm is introduced to optimize our model. Experiments on PASCAL VOC 2007 and 2012 demonstrate the effectiveness of our network.

Keywords

Faster R-CNN General object detection Multi-scale object proposal Multi-layer feature aggregation 

References

  1. 1.
    Everingham, M., Eslami, S.A., Van Gool, L., et al.: The Pascal visual object classes challenge: a retrospective. Int. J. Comput. Vis. 111, 98–136 (2015). LNCS. SpringerCrossRefGoogle Scholar
  2. 2.
    Lin, T.-Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., Zitnick, C.L.: Microsoft COCO: Common Objects in Context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 740–755. Springer, Cham (2014). doi: 10.1007/978-3-319-10602-1_48 Google Scholar
  3. 3.
    Ren, S., He, K., Girshick, R., et al.: Faster R-CNN: towards real-time object detection with region proposal networks. In: Advances in Neural Information Processing Systems 28, pp. 91–99. Curran Associates, Montréal (2015)Google Scholar
  4. 4.
    Girshick, R., Donahue, J., Darrell, T., et al.: Region-based convolutional networks for accurate object detection and segmentation. In: IEEE Computer Vision and Pattern Recognition, pp. 580–587. IEEE Press, Columbus (2014)Google Scholar
  5. 5.
    Uijlings, J.R., Van De Sande, K.E., Gevers, T., et al.: Selective search for object recognition. Int. J. Comput. Vis. 104, 154–171 (2013)CrossRefGoogle Scholar
  6. 6.
    Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems 25, pp. 1097–1105. Curran Associates, South Lake Tahoe (2012)Google Scholar
  7. 7.
    Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)
  8. 8.
    Girshick, R.: Fast R-CNN. In: IEEE International Conference on Computer Vision, pp. 1440–1448. IEEE Press, Santiago (2015)Google Scholar
  9. 9.
    Yu, W., Yang, K., Bai, Y., et al.: Visualizing and comparing convolutional neural networks. arXiv preprint arXiv:1412.6631 (2014)
  10. 10.
    Kong, T., Yao, A., Chen, Y., et al.: HyperNet: towards accurate region proposal generation and joint object detection. In: IEEE Computer Vision and Pattern Recognition, pp. 845–853. IEEE Press, Las Vegas (2016)Google Scholar
  11. 11.
    Zhang, L., Lin, L., Liang, X., He, K.: Is Faster R-CNN doing well for pedestrian detection? In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9906, pp. 443–457. Springer, Cham (2016). doi: 10.1007/978-3-319-46475-6_28 CrossRefGoogle Scholar
  12. 12.
    Yang, B., Yan, J., Lei, Z., et al.: Craft objects from images. In: IEEE Computer Vision and Pattern Recognition, pp. 6043–6051. IEEE Press, Las Vegas (2016)Google Scholar
  13. 13.
    Gidaris, S., Komodakis, N.: Attend refine repeat: active box proposal generation via in-out localization. arXiv preprint arXiv:1606.04446 (2016)
  14. 14.
    Cai, Z., Fan, Q., Feris, R.S., Vasconcelos, N.: A unified multi-scale deep convolutional neural network for fast object detection. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9908, pp. 354–370. Springer, Cham (2016). doi: 10.1007/978-3-319-46493-0_22 CrossRefGoogle Scholar
  15. 15.
    Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C.-Y., Berg, A.C.: SSD: single shot multibox detector. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9905, pp. 21–37. Springer, Cham (2016). doi: 10.1007/978-3-319-46448-0_2 CrossRefGoogle Scholar
  16. 16.
    Lin, T.Y., Dollár, P., Girshick, R., et al.: Feature pyramid networks for object detection. arXiv preprint arXiv:1612.03144 (2016)
  17. 17.
    Bell, S., Lawrence Zitnick, C., Bala, K., et al.: Inside-outside net: detecting objects in context with skip pooling and recurrent neural networks. In: IEEE Computer Vision and Pattern Recognition, pp. 2874–2883. IEEE Press, Las Vegas (2016)Google Scholar
  18. 18.
    Ghodrati, A., Diba, A., Pedersoli, M., et al.: DeepProposal: hunting objects by cascading deep convolutional layers. In: IEEE International Conference on Computer Vision, pp. 2578–2586. IEEE Press, Santiago (2015)Google Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  1. 1.Tsinghua National Laboratory for Information Science and Technology (TNList), Institute for Network Sciences and Cyberspace (INSC)Tsinghua UniversityBeijingChina

Personalised recommendations