PackDet: Packed Long-Head Object Detector

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 12358)


State-of-the-art object detectors exploit multi-branch structure and predict objects at several different scales, although substantially boosted accuracy is acquired, low efficiency is inevitable as fragmented structure is hardware unfriendly. To solve this issue, we propose a packing operator (PackOp) to combine all head branches together at spatial. Packed features are computationally more efficient and allow to use cross-head group normalization (GN) at handy, leading to notable accuracy improvement against the common head-separate GN. All of these are only at the cost of less than 5.7% relative increase on runtime memory and introduction of a few noisy training samples, however, whose side-effects could be diminished by good packing patterns design. With PackOp, we propose a new anchor-free one-stage detector, PackDet, which features a single deeper/longer but narrower head compared to the existing methods: multiple shallow but wide heads. Our best models on COCO test-dev achieve better speed-accuracy balance: 35.1%, 42.3%, 44.0%, 47.4% AP with 22.6, 16.9, 12.4, 4.7 FPS using MobileNet-v2, ResNet-50, ResNet-101, and ResNeXt-101-DCN backbone, respectively. Codes will be released.(


Object detection Anchor-free Packing features Long head 



This research was financially supported by National Natural Science Foundation of China (61731022, 91646207) and the Strategic Priority Research Program of the Chinese Academy of Sciences (XDA19090300). We would like to thank Rui Yang and Chaoyi Liu from EvaVisdom Tech for the inspiring discussions. We also thank the anonymous reviewers for their valuable suggestions.

Supplementary material

504454_1_En_11_MOESM1_ESM.pdf (18.1 mb)
Supplementary material 1 (pdf 18546 KB)


  1. 1.
    Bochkovskiy, A., Wang, C.Y., Liao, H.Y.: YOLOv4: Optimal speed and accuracy of object detection. arXiv:2004.10934 (2020)
  2. 2.
    Cai, Z., Vasconcelos, N.: Cascade R-CNN: delving into high quality object detection. In: CVPR (2018)Google Scholar
  3. 3.
    Dai, J., Li, Y., He, K., et al.: R-FCN: object detection via region-based fully convolutional networks. In: NeurIPS (2016)Google Scholar
  4. 4.
    Duan, K., Bai, S., Xie, L., et al.: CenterNet: Keypoint triplets for object detection. arXiv:1904.08189 (2019)
  5. 5.
    Fu, C., Liu, W., Ranga, A., et al.: DSSD: Deconvolutional single shot detector. arXiv:1701.06659 (2017)
  6. 6.
    Girshick, R.B.: Fast R-CNN. In: ICCV (2015)Google Scholar
  7. 7.
    He, K., Gkioxari, G., Dollár, P., et al.: Mask R-CNN. In: ICCV (2017)Google Scholar
  8. 8.
    He, K., Zhang, X., Ren, S., et al.: Deep residual learning for image recognition. In: CVPR (2016)Google Scholar
  9. 9.
    Huang, E., Korf, R.E.: New improvements in optimal rectangle packing. In: IJCAI (2009)Google Scholar
  10. 10.
    Iandola, F., Moskewicz, M., Karayev, S., et al.: DenseNet: Implementing efficient convnet descriptor pyramids. arXiv:1404.1869 (2014)
  11. 11.
    Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. In: ICML (2015)Google Scholar
  12. 12.
    Korf, R.E.: Optimal rectangle packing: initial results. In: ICAPS (2003)Google Scholar
  13. 13.
    Li, Y., Chen, Y., Wang, N., et al.: Scale-aware trident networks for object detection. In: ICCV (2019)Google Scholar
  14. 14.
    Li, Z., Peng, C., Yu, G., et al.: Light-head R-CNN: In defense of two-stage object detector. arXiv:1711.07264 (2017)
  15. 15.
    Lin, T., Dollár, P., Girshick, R.B., et al.: Feature pyramid networks for object detection. In: CVPR (2017)Google Scholar
  16. 16.
    Lin, T., Goyal, P., Girshick, R.B., et al.: Focal loss for dense object detection. In: ICCV (2017)Google Scholar
  17. 17.
    Liu, T.-Y., et al.: Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 740–755. Springer, Cham (2014). Scholar
  18. 18.
    Liu, W., et al.: SSD: single shot multibox detector. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9905, pp. 21–37. Springer, Cham (2016). Scholar
  19. 19.
    Newell, A., Yang, K., Deng, J.: Stacked hourglass networks for human pose estimation. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9912, pp. 483–499. Springer, Cham (2016). Scholar
  20. 20.
    Pang, J., Chen, K., Shi, J., et al.: Libra R-CNN: towards balanced learning for object detection. In: CVPR (2019)Google Scholar
  21. 21.
    Papandreou, G., Kokkinos, I., Savalle, P.A.: Untangling local and global deformations in deep convolutional networks for image classification and sliding window detection. arXiv:1412.0296 (2014)
  22. 22.
    Peng, C., Xiao, T., Li, Z., et al.: MegDet: a large mini-batch object detector. In: CVPR (2018)Google Scholar
  23. 23.
    Redmon, J., Divvala, S.K., Girshick, R.B., et al.: You only look once: unified, real-time object detection. In: CVPR (2016)Google Scholar
  24. 24.
    Redmon, J., Farhadi, A.: YOLO9000: better, faster, stronger. In: CVPR (2017)Google Scholar
  25. 25.
    Redmon, J., Farhadi, A.: YOLOv3: An incremental improvement. arXiv:1804.02767 (2018)
  26. 26.
    Ren, S., He, K., Girshick, R.B., et al.: Faster R-CNN: towards real-time object detection with region proposal networks. In: NeurIPS (2015)Google Scholar
  27. 27.
    Rezatofighi, H., Tsoi, N., Gwak, J., et al.: Generalized intersection over union: a metric and a loss for bounding box regression. In: CVPR (2019)Google Scholar
  28. 28.
    Sandler, M., Howard, A.G., Zhu, M., et al.: MobileNetV2: inverted residuals and linear bottlenecks. In: CVPR (2018)Google Scholar
  29. 29.
    Tan, M., Pang, R., Le, Q.V.: EfficientDet: scalable and efficient object detection. arXiv:1911.09070 (2019)
  30. 30.
    Tian, Z., Shen, C., Chen, H., et al.: FCOS: Fully convolutional one-stage object detection. arXiv:1904.01355 (2019)
  31. 31.
    Wang, N., Gao, Y., Chen, H., et al.: NAS-FCOS: Fast neural architecture search for object detection. arXiv:1906.04423 (2019)
  32. 32.
    Wei, C., Xie, L., Ren, X., et al.: Iterative reorganization with weak spatial constraints: solving arbitrary Jigsaw puzzles for unsupervised representation learning. In: CVPR (2019)Google Scholar
  33. 33.
    Wu, B., Dai, X., Zhang, P., et al.: FBNet: hardware-aware efficient ConvNet design via differentiable neural architecture search. In: CVPR (2019)Google Scholar
  34. 34.
    Wu, Y., He, K.: Group normalization. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11217, pp. 3–19. Springer, Cham (2018). Scholar
  35. 35.
    Xie, S., Girshick, R.B., Dollár, P., et al.: Aggregated residual transformations for deep neural networks. In: CVPR (2017)Google Scholar
  36. 36.
    Yang, T., Zhang, X., Li, Z., et al.: MetaAnchor: learning to detect objects with customized anchors. In: NeurIPS (2018)Google Scholar
  37. 37.
    Yang, Z., Liu, S., Hu, H., et al.: RepPoints: Point set representation for object detection. arXiv:1904.11490 (2019)
  38. 38.
    Zhang, S., Wen, L., Bian, X., et al.: Single-shot refinement neural network for object detection. In: CVPR (2018)Google Scholar
  39. 39.
    Zhang, Z., He, T., Zhang, H., et al.: Bag of freebies for training object detection neural networks. arXiv:1902.04103 (2019)
  40. 40.
    Zhao, Q., Sheng, T., Wang, Y., et al.: M2Det: a single-shot object detector based on multi-level feature pyramid network. In: AAAI (2019)Google Scholar
  41. 41.
    Zhong, Y., Wang, J., Peng, J., et al.: Anchor box optimization for object detection. arXiv:1812.00469 (2018)
  42. 42.
    Zhu, C., Chen, F., Shen, Z., et al.: Soft anchor-point object detection. arXiv, arXiv:1911.12448 (2019)
  43. 43.
    Zhu, C., He, Y., Savvides, M.: Feature selective anchor-free module for single-shot object detection. In: CVPR (2019)Google Scholar
  44. 44.
    Zhu, X., Hu, H., Lin, S., et al.: Deformable ConvNets v2: more deformable, better results. In: CVPR (2019)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  1. 1.Aerospace Information Research InstituteChinese Academy of SciencesBeijingChina
  2. 2.National Laboratory of Pattern Recognition, Institute of AutomationChinese Academy of SciencesBeijingChina
  3. 3.Beijing EvaVisdom TechBeijingChina

Personalised recommendations