Advertisement

Improving Object Detection with Convolutional Neural Network via Iterative Mechanism

  • Xin Qiu
  • Chun Yuan
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10636)

Abstract

The iterative mechanism is prevalent and widely used in many fields, since iterations of simple functions can make complex behaviors. But this mechanism is often overlooked by the state-of-the-art convolutional neural network (CNN)-based object detection methods. In this paper, we propose to use the iterative mechanism to improve the object detection performance of the CNN algorithms. In order to show the benefits of using the iterative mechanism in object detection from more aspects, the main contributions of our work are two aspects: Firstly, we train an iterative version of Faster RCNN to show the application of the iterative mechanism in improving the localization accuracy; Secondly, we present a prototype CNN model that iteratively searches for objects on a very simple dataset to generate proposals. The thoughtful experiments on object detection benchmark datasets show that the proposed two iterative methods consistently improve the performance of the baseline methods, e.g. in PASCAL VOC2007 test set, our iterative version of Faster RCNN has 0.7115 mAP about 1.5 points higher than the baseline Faster RCNN (0.6959 mAP).

Keywords

Object detecion Convolutional neural network Iterative 

Notes

Acknowledgments

This work is supported by the National High Technology Research and Development Plan (863 Plan) under Grant No.2015AA015800, the NSFC project under Grant No. U1433112, the Joint Research Center of Tencent & Tsinghua University.

References

  1. 1.
    Hoffman, J.D., Frankel, S.: Numerical Methods for Engineers and Scientists. CRC Press, Boca Raton (2001)Google Scholar
  2. 2.
    Najibi, M., Rastegari, M., Davis, L.S.: G-CNN: an iterative grid based object detector. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2369–2377 (2016)Google Scholar
  3. 3.
    Huang, J., Rathod, V., Sun, C., Zhu, M., Korattikara, A., Fathi, A., Fischer, I., Wojna, Z., Song, Y., Guadarrama, S., et al.: Speed/accuracy trade-offs for modern convolutional object detectors. arXiv preprint arXiv:1611.10012 (2016)
  4. 4.
    Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: Advances in Neural Information Processing Systems, pp. 91–99 (2015)Google Scholar
  5. 5.
    Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: unified, real-time object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016)Google Scholar
  6. 6.
    Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C.-Y., Berg, A.C.: SSD: single shot multibox detector. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9905, pp. 21–37. Springer, Cham (2016). doi: 10.1007/978-3-319-46448-0_2 CrossRefGoogle Scholar
  7. 7.
    Sukhbaatar, S., Weston, J., Fergus, R., et al.: End-to-end memory networks. In: Advances in Neural Information Processing Systems, pp. 2440–2448 (2015)Google Scholar
  8. 8.
    Hara, K., Liu, M.Y., Tuzel, O., Farahmand, A.M.: Attentional network for visual object detection. arXiv preprint arXiv:1702.01478 (2017)
  9. 9.
    Mnih, V., Heess, N., Graves, A., et al.: Recurrent models of visual attention. In: Advances in Neural Information Processing Systems, pp. 2204–2212 (2014)Google Scholar
  10. 10.
    Caicedo, J.C., Lazebnik, S.: Active object localization with deep reinforcement learning. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2488–2496 (2015)Google Scholar
  11. 11.
    He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)Google Scholar
  12. 12.
    Yoo, D., Park, S., Lee, J.Y., Paek, A.S., So Kweon, I.: AttentionNet: aggregating weak directions for accurate object detection. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2659–2667 (2015)Google Scholar
  13. 13.
    Stewart, R., Andriluka, M., Ng, A.Y.: End-to-end people detection in crowded scenes. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2325–2333 (2016)Google Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  1. 1.Graduate School at ShenzhenTsinghua UniversityShenzhenChina

Personalised recommendations