Skip to main content

A Local Top-Down Module for Object Detection with Multi-scale Features

  • 1994 Accesses

Part of the Lecture Notes in Computer Science book series (LNIP,volume 11259)


Object detection methods based on deep models and multi-scale features have achieved the state-of-the-art performance. However, since each feature layer operates independently, several issues such as box-in-box detections and less effective performance on small objects need to be addressed. In this paper, we tackle these issues by integrating contextual and semantic information from higher layer features into the prediction layer. Existing methods adopting similar ideas mostly apply full top-down modules, which may increase computational loads significantly. Instead, we present an efficient while general local top-down module, in which each prediction layer is integrated only with the upsampled features from its two succeeding layers. Experimental results show that the proposed algorithm performs favorably against the state-of-the-art methods on the VOC, COCO and HollywoodHeads datasets, while introducing little computational overhead. Compared with methods using full top-down modules, the proposed algorithm achieves comparable or higher accuracy while operates at a higher frame rate. The code is available at


  • Object detection
  • SSD
  • Deconvolution
  • Local top-down module

This is a preview of subscription content, access via your institution.

Buying options

USD   29.95
Price excludes VAT (USA)
  • DOI: 10.1007/978-3-030-03341-5_6
  • Chapter length: 13 pages
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
USD   69.99
Price excludes VAT (USA)
  • ISBN: 978-3-030-03341-5
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
Softcover Book
USD   89.99
Price excludes VAT (USA)
Fig. 1.
Fig. 2.
Fig. 3.


  1. Dai, J., Li, Y., He, K., Sun, J.: R-FCN: object detection via region-based fully convolutional networks. In: NIPS, pp. 379–387 (2016)

    Google Scholar 

  2. Dvornik, N., Shmelkov, K., Marial, J., Schmid, C.: BlitzNet: a real-time deep network for scene understanding. In: ICCV, pp. 4174–4182 (2017)

    Google Scholar 

  3. Fu, C.Y., Liu, W., Ranga, A., Tyagi, A., Berg, A.C.: DSSD: deconvolutional single shot detector. arXiv preprint arXiv: 1701.06659 (2017)

  4. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016)

    Google Scholar 

  5. Howard, A.G., et al.: MobileNets: efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv: 1704.04861 (2017)

  6. Jeong, J., Park, H., Kwak, N.: Enhancement of SSD by concatenating feature maps for object detection. In: BMVC (2017)

    Google Scholar 

  7. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: ICLR (2015)

    Google Scholar 

  8. Kong, T., Sun, F., Yao, A., Liu, H., Lu, M., Chen, Y.: RON: reverse connection with objectness prior networks for object detection. In: CVPR, pp. 5244–5252 (2017)

    Google Scholar 

  9. Lin, T.Y., Dollár, P., Girshick, R., He, K., Hariharan, B., Belongie, S.: Feature pyramid networks for object detection. In: ICCV, pp. 936–944 (2017)

    Google Scholar 

  10. Liu, W., et al.: SSD: Single shot multibox detector. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9905, pp. 21–37. Springer, Cham (2016).

    CrossRef  Google Scholar 

  11. Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: unified, real-time object detection. In: CVPR, pp. 779–788 (2016)

    Google Scholar 

  12. Redmon, J., Farhadi, X.: YOLO9000: better, faster, stronger. In: CVPR, pp. 6517–6525 (2017)

    Google Scholar 

  13. Ren, J., et al.: Accurate single stage detector using recurrent rolling convolution. In: CVPR, pp. 752–760 (2017)

    Google Scholar 

  14. Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal network. In: NIPS, pp. 91–99 (2015)

    Google Scholar 

  15. Shrivastava, A., Sukthankar, R., Malik, J., Gupta, A.: Beyond skip connections: top-down modulation for object detection. arXiv preprint arXiv: 1612.06851 (2016)

  16. Tychsen-Smith, L., Petersson, L.: Denet: scalable real-time object detection with directed sparse sampling. In: ICCV, pp. 428–436 (2017)

    Google Scholar 

  17. Vu, T.H., Osokin, A., Laptev, I.: Contex-aware CNNs for person head detection. In: ICCV, pp. 2893–2901 (2015)

    Google Scholar 

  18. Zhang, S., Wen, L., Bian, X., Lei, Z., Li, S.Z.: Single-shot refinement neural network for object detection. In: CVPR (2018)

    Google Scholar 

  19. Zhu, Y., Zhao, C., Wang, J., Zhao, X., Wu, Y., Lu, H.: Couplenet: coupling global structure with local parts for object detection. In: ICCV, pp. 4146–4154 (2017)

    Google Scholar 

Download references


This work is supported by Natural Science Foundation of Liaoning Province, China, #20170540312.

Author information

Authors and Affiliations


Corresponding author

Correspondence to Lu Wang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and Permissions

Copyright information

© 2018 Springer Nature Switzerland AG

About this paper

Verify currency and authenticity via CrossMark

Cite this paper

Huang, S., Wang, L., Yang, P., Deng, Q. (2018). A Local Top-Down Module for Object Detection with Multi-scale Features. In: , et al. Pattern Recognition and Computer Vision. PRCV 2018. Lecture Notes in Computer Science(), vol 11259. Springer, Cham.

Download citation

  • DOI:

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-03340-8

  • Online ISBN: 978-3-030-03341-5

  • eBook Packages: Computer ScienceComputer Science (R0)