Improving Object Detection with Convolutional Neural Network via Iterative Mechanism
The iterative mechanism is prevalent and widely used in many fields, since iterations of simple functions can make complex behaviors. But this mechanism is often overlooked by the state-of-the-art convolutional neural network (CNN)-based object detection methods. In this paper, we propose to use the iterative mechanism to improve the object detection performance of the CNN algorithms. In order to show the benefits of using the iterative mechanism in object detection from more aspects, the main contributions of our work are two aspects: Firstly, we train an iterative version of Faster RCNN to show the application of the iterative mechanism in improving the localization accuracy; Secondly, we present a prototype CNN model that iteratively searches for objects on a very simple dataset to generate proposals. The thoughtful experiments on object detection benchmark datasets show that the proposed two iterative methods consistently improve the performance of the baseline methods, e.g. in PASCAL VOC2007 test set, our iterative version of Faster RCNN has 0.7115 mAP about 1.5 points higher than the baseline Faster RCNN (0.6959 mAP).
KeywordsObject detecion Convolutional neural network Iterative
This work is supported by the National High Technology Research and Development Plan (863 Plan) under Grant No.2015AA015800, the NSFC project under Grant No. U1433112, the Joint Research Center of Tencent & Tsinghua University.
- 1.Hoffman, J.D., Frankel, S.: Numerical Methods for Engineers and Scientists. CRC Press, Boca Raton (2001)Google Scholar
- 2.Najibi, M., Rastegari, M., Davis, L.S.: G-CNN: an iterative grid based object detector. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2369–2377 (2016)Google Scholar
- 3.Huang, J., Rathod, V., Sun, C., Zhu, M., Korattikara, A., Fathi, A., Fischer, I., Wojna, Z., Song, Y., Guadarrama, S., et al.: Speed/accuracy trade-offs for modern convolutional object detectors. arXiv preprint arXiv:1611.10012 (2016)
- 4.Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: Advances in Neural Information Processing Systems, pp. 91–99 (2015)Google Scholar
- 5.Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: unified, real-time object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016)Google Scholar
- 7.Sukhbaatar, S., Weston, J., Fergus, R., et al.: End-to-end memory networks. In: Advances in Neural Information Processing Systems, pp. 2440–2448 (2015)Google Scholar
- 8.Hara, K., Liu, M.Y., Tuzel, O., Farahmand, A.M.: Attentional network for visual object detection. arXiv preprint arXiv:1702.01478 (2017)
- 9.Mnih, V., Heess, N., Graves, A., et al.: Recurrent models of visual attention. In: Advances in Neural Information Processing Systems, pp. 2204–2212 (2014)Google Scholar
- 10.Caicedo, J.C., Lazebnik, S.: Active object localization with deep reinforcement learning. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2488–2496 (2015)Google Scholar
- 11.He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)Google Scholar
- 12.Yoo, D., Park, S., Lee, J.Y., Paek, A.S., So Kweon, I.: AttentionNet: aggregating weak directions for accurate object detection. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2659–2667 (2015)Google Scholar
- 13.Stewart, R., Andriluka, M., Ng, A.Y.: End-to-end people detection in crowded scenes. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2325–2333 (2016)Google Scholar