Advertisement

A single-shot multi-level feature reused neural network for object detection

Abstract

Recent years have witnessed the significant progress in object detection using deep convolutional neutral networks. However, there are few object detectors achieving high precision with low computational cost. In this paper, a novel and lightweight framework named multi-level feature reused detector (MFRDet) is proposed, which can reach a better accuracy than two-stage methods. It also can maintain comparable high efficiency of one-stage methods without employing very deep convolution neural networks as most modern detectors do. The proposed framework is suitable for reusing information included in deep and shallow feature maps, by which property the detection precision can be higher. For the Pascal VOC2007 test set trained with VOC 2007 and VOC 2012 training sets, the proposed MFRDet with the input size of 300 \(\times \) 300 can achieve 80.7% mAP at the speed of 62.5 FPS. As for a high-resolution input version, MFRDet can obtain 82.0% mAP with the speed of 37.0 FPS using single Nvidia Tesla P100 GPU. The proposed framework shows the state-of-the-art mAP with high FPS, which is better than most of other modern object detectors.

This is a preview of subscription content, log in to check access.

Access options

Buy single article

Instant unlimited access to the full article PDF.

US$ 39.95

Price includes VAT for USA

Subscribe to journal

Immediate online access to all issues from 2019. Subscription will auto renew annually.

US$ 199

This is the net price. Taxes to be calculated in checkout.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

References

  1. 1.

    Abdellatef, E., Ismail, N.A., Elrahman, S.E.S.E.A., Ismail, K.N., Rihan, M., Elsamie, F.E.A.: Cancelable multi-biometric recognition system based on deep learning. Vis. Comput. 1–13 (2019)

  2. 2.

    Abhinav, S., Abhinav, G.: Contextual priming and feedback for faster r-cnn. In: European Conference on Computer Vision (ECCV), pp. 330–348 (2016)

  3. 3.

    Akata, Z., Perronnin, F., Harchaoui, Z., Schmid, C.: Label-embedding for image classification. IEEE Trans. Pattern Anal. Mach. Intell. 38(7), 1425–1438 (2016)

  4. 4.

    Bansal, A., Sikka, K., Sharma, G., Chellappa, R., Divakaran, A.: Zero-shot object detection. In: Proceedings paper of ECCV, pp. 397–414 (2018)

  5. 5.

    Changpinyo, S., Chao, W., Gong, B., Sha, F.: Synthesized classifiers for zero-shot learning. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5327–5336 (2016)

  6. 6.

    Cheng Yang, F., Liu, W., Ranga, A., Tyagi, A., Berg, A.C.: Dssd: deconvolutional single shot detector. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.1937–1945 (2017)

  7. 7.

    Demirel, B., Cinbis, R.G., Ikizlercinbis, N.: Zero-shot object detection by hybrid region embedding. IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2018)

  8. 8.

    Feifei, L., Fergus, R., Perona, P.: One-shot learning of object categories. IEEE Trans. Pattern Anal. Mach. Intell. 28(4), 594–611 (2006)

  9. 9.

    Guosen, X., Li, L., Xiaobo, J., Fan, Z., Zheng, Z., Jie, Q., Yazhou, Y., Ling, S.: Attentive region embedding network for zero-shot learning. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2019)

  10. 10.

    Hanling, Z., Min, X., Liyuan, Z., Havyarimana, V.: A novel optimization framework for salient object detection. Vis. Comput. 32(1), 31–41 (2016)

  11. 11.

    Hao, G., Baozhong, C.: How do deep convolutional features affect tracking performance: an experimental study. Vis. Comput. 34(12), 1701–1711 (2018)

  12. 12.

    Heng, F., Haibin, L.: Sanet: Structure-aware network for visual tracking. IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2016)

  13. 13.

    Hong, T., Yao, A., Yurong, C., Sun, F.: Hypernet: Towards accurate region proposal generation and joint object detection. In: IEEE Conference on Computer Vision and Pattern Recognition(CVPR), pp. 845–853 (2016)

  14. 14.

    Howard, A., Menglong, Z., Bo, C., Kalenichenko, D., Weijun, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017)

  15. 15.

    Hyungtae, L., Sungmin, E., Heesung, K.: Me r-cnn: multi-expert region-based cnn for object detection. In: The IEEE International Conference on Computer Vision (ICCV) (2017)

  16. 16.

    Jiahui, C., Jianguo, H.: 3d rans: 3d residual attention networks for action recognition. Vis. Comput. pp. 1–10 (2019)

  17. 17.

    Jifeng, D., Li, Y., Kaiming, H., Jian, S.: R-fcn: Object detection via region-based fully convolutional networks. In: Conference on Neural Information Processing Systems (NIPS), pp. 379–387 (2016)

  18. 18.

    Jifeng, D., Yi, L., Kaiming, H., Jian, S.: R-fcn: object detection via region-based fully convolutional networks. In: Advances in Neural Information Processing Systems (NIPS), vol. 29, pp. 379–387 (2016)

  19. 19.

    Jinhui, T., Lu, J., Zechao, L., Shenghua, G.: Rgb-d object recognition by incorporating latent data structure and prior knowledge. IEEE Trans. Multimed. 17, 1899–1908 (2015)

  20. 20.

    Jinhui, T., Xiangbo, S., Zechao, L., Guojun, Q., Jingdong, W.: Generalized deep transfer networks for knowledge propagation in heterogeneous domains. ACM Multimed. 12(4), 68 (2016)

  21. 21.

    Jisoo, J., Hyojin, P., Nojun, K.: Enhancement of ssd by concatenating feature maps for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017)

  22. 22.

    Joseph, R., Ali, F.: Yolov3: An incremental improvement. In: Computer Vision and Pattern Recognition (CVPR) (2018)

  23. 23.

    Joseph, R., Santosh Kumar, D., Ross B, G., Ali, F.: You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 779–788 (2016)

  24. 24.

    Kaiming, H., Gkioxari, G., Dollar, P., Girshick, R.: Mask r-cnn. In: The IEEE International Conference on Computer Vision (ICCV), pp. 2980–2988 (2017)

  25. 25.

    Kaiming, H., Xiangyu, Z., Shaoqing, R., Jian, S.: Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans. Pattern Anal. Mach. Intell. 37(9), 1904–1916 (2015)

  26. 26.

    Kaiming, H., Xiangyu, Z., Shaoqing, S., Jian, S.: Deep residual learning for image recognition. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)

  27. 27.

    Li, N., Wen, L., Dong, X.: Visual recognition by learning from web data: a weakly supervised domain generalization approach pp. 2774–2783 (2015)

  28. 28.

    LiTao, L., Mao, Y., Jian, D.: Discriminative Hough context model for object detection. Vis. Comput. 30(1), 59–69 (2014)

  29. 29.

    Mark, E., John, W.: The pascal visual object classes challenge 2007 (voc2007) development kit. Int. J. Comput. Vision 111(1), 98–136 (2006)

  30. 30.

    Navaneeth, B., Bharat, S., Rama, C., Larry S, D.: Soft-nms improving object detection with one line of code. In: IEEE International Conference on Computer Vision (ICCV), pp. 5562–5570 (2017)

  31. 31.

    Rahman, S., Khan, S.H., Porikli, F.: Zero-shot object detection: Learning to simultaneously recognize and localize novel concepts. IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2018)

  32. 32.

    Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 6517–6525 (2017)

  33. 33.

    Romeraparedes, B., Torr, P.H.S.: An embarrassingly simple approach to zero-shot learning. In: International Conference on MachineLearning (ICML)

  34. 34.

    Ross B, G., Jeff, D., Trevor, D., Jitendra, M.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 580–587 (2014)

  35. 35.

    Schneider, L., Jasch, M., Frohlich, B., Weber, T., Franke, U., Pollefeys, M., Ratsch, M.: Multimodal neural networks: Rgb-d for semantic segmentation and object detection, pp. 98–109 (2017)

  36. 36.

    Sean, B., C Lawrence, Z., Kavita, B., Ross B, G.: Inside-outside net: Detecting objects in context with skip pooling and recurrent neural networks. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2874–2883 (2016)

  37. 37.

    Sermanet, P., Eigen, D., Zhang, X., Mathieu, M., Fergus, R., Lecun, Y.: Overfeat: Integrated recognition, localization and detection using convolutional networks. In: International Conference on Learning Representations (ICLR) (2014)

  38. 38.

    Shaoqing, R., Kaiming, H., Girshick, R., Jian, S.: Faster r-cnn: towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 39(6), 1137–1149 (2017)

  39. 39.

    Shengjing, T., Shuwei, S., Guoqiang, T., Xiuping, L., Baocai, Y.: End-to-end deep metric network for visual tracking. Vis. Comput. pp. 1–14 (2019)

  40. 40.

    Singh, B., Davis, L.S.: An analysis of scale invariance in object detection - snip. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3578–3587 (2018)

  41. 41.

    Songtao, L., Di, H., Yunhong, W.: Receptive field block net for accurate and fast object detection. In: European Conference on Computer Vision (ECCV) (2018)

  42. 42.

    Spyros, G., Nikos, K.: Object detection via a multi-region and semantic segmentation-aware cnn model. In: IEEE International Conference on Computer Vision (ICCV), pp. 1134–1142 (2015)

  43. 43.

    Szegedy, C., Ioffe, S., Vanhoucke, V.: Inception-v4, inception-resnet and the impact of residual connections on learning. In: Association for the Advance of Artificial Intelligence (AAAI) (2017)

  44. 44.

    Tao, K., Fuchun, S., Anbang Yaoand Huaping, L., Ming, L., Yurong, C.: Ron: Reverse connection with objectness prior networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5244–5252 (2017)

  45. 45.

    Tsung-Y, L., Goyal, P., Girshick, R., He, K., Dollar, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision (CVPR)

  46. 46.

    Tsung-Yi, L., Dollar, P., Girshick, R., Kaiming, H., Belongie, S.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 936–944 (2017)

  47. 47.

    Tsung-Yi, L., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollar, P., Zitnick, C.L.: Microsoft coco: Common objects in context. In: European conference on computer vision (ECCV) (2014)

  48. 48.

    Wei, L., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C.Y., Berg, A.C.: Ssd: Single shot multibox detector. In: European Conference on Computer Vision (ECCV), pp. 21–37. Springer (2016)

  49. 49.

    Xiangbo, S., Guojun, Q., Jinhui, T., Jingdong, W.: Weakly-shared deep transfer networks for heterogeneous-domain knowledge propagation. In: ACM Multimedia, pp. 35–44 (2015)

  50. 50.

    Yazhou, Y., Fumin, S., Jian, Z., Li, L., Zhenmin, T., Ling, S.: Extracting privileged information for enhancing classifier learning. IEEE Trans. Image Process. 28(1), 436–450 (2019)

  51. 51.

    Yazhou, Y., Jian, Z., Fumin, S., Li, L., Fan, Z., Dongxiang, Z., Hengtao, S.: Towards automatic construction of diverse, high-quality image dataset. IEEE Trans. Knowl. Data Eng. (2019)

  52. 52.

    Yazhou, Y., Jian, Z., Fumin, S., Xiansheng, H., Jingsong, X., Zhenmin, T.: Exploiting web images for dataset construction: a domain robust approach. IEEE Trans. Multimed. 19(8), 1771–1784 (2017)

  53. 53.

    Yazhou, Y., Jian, Z., Fumin, S., Xiansheng, H., Jingsong, X., ZhenminT, T.: Automatic image dataset construction with multiple textual metadata, pp. 1–6 (2016)

  54. 54.

    Zhang, L., Dai, J., Lu, H., He, Y., Wang, G.: A bi-directional message passing model for salient object detection. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1741–1750 (2018)

  55. 55.

    Zhaowei, C., Quanfu, F., Rogerio, S., FerisNuno, V.: A unified multi-scale deep convolutional neural network for fast object detection. In: European Conference on Computer Vision (ECCV), pp. 354–370 (2016)

  56. 56.

    ZhaoYue, Z., Yuanjun, X., Dahua, L.: Recognize actions by disentangling components of dynamics, pp. 6566–6575 (2018)

  57. 57.

    Zhiqiang, S., Zhuang, L., Jianguo, L., Yugang, J., Yurong, C., Xiangyang, X.: Dsod: Learning deeply supervised object detectors from scratch. In: IEEE International Conference on Computer Vision (ICCV), pp. 1937–1945 (2017)

  58. 58.

    Zhou, P., Bingbing, N., Cong, G., Jianguo, H., Yi, X.: Scale-transferrable object detection. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 528–537 (2018)

  59. 59.

    Zhun, Z., Liang, Z., Guoliang, K., Shaozi, L., Yi, Y.: Random erasing data augmentation. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017)

  60. 60.

    Ziming, Z., Saligrama, V.: Zero-shot learning via joint latent similarity embedding. IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2015)

Download references

Acknowledgements

This research is supported by the Youth Foundation of Hebei Province (CN) No. E2018203162.

Author information

Correspondence to Wei Cui.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Wei, L., Cui, W., Hu, Z. et al. A single-shot multi-level feature reused neural network for object detection. Vis Comput (2020) doi:10.1007/s00371-019-01787-3

Download citation

Keywords

  • Object detection
  • Deep convolutional neural network
  • Feature reused