PackDet: Packed Long-Head Object Detector

Ding, Kun; He, Guojin; Gu, Huxiang; Zhong, Zisha; Xiang, Shiming; Pan, Chunhong

doi:10.1007/978-3-030-58601-0_11

Kun Ding¹²,
Guojin He¹²,
Huxiang Gu^13,14,
Zisha Zhong¹³,
Shiming Xiang¹³ &
…
Chunhong Pan¹³

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 12358))

Included in the following conference series:

European Conference on Computer Vision

2928 Accesses

Abstract

State-of-the-art object detectors exploit multi-branch structure and predict objects at several different scales, although substantially boosted accuracy is acquired, low efficiency is inevitable as fragmented structure is hardware unfriendly. To solve this issue, we propose a packing operator (PackOp) to combine all head branches together at spatial. Packed features are computationally more efficient and allow to use cross-head group normalization (GN) at handy, leading to notable accuracy improvement against the common head-separate GN. All of these are only at the cost of less than 5.7% relative increase on runtime memory and introduction of a few noisy training samples, however, whose side-effects could be diminished by good packing patterns design. With PackOp, we propose a new anchor-free one-stage detector, PackDet, which features a single deeper/longer but narrower head compared to the existing methods: multiple shallow but wide heads. Our best models on COCO test-dev achieve better speed-accuracy balance: 35.1%, 42.3%, 44.0%, 47.4% AP with 22.6, 16.9, 12.4, 4.7 FPS using MobileNet-v2, ResNet-50, ResNet-101, and ResNeXt-101-DCN backbone, respectively. Codes will be released.(https://github.com/kding1225/PackDet)

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
https://github.com/facebookresearch/maskrcnn-benchmark.

References

Bochkovskiy, A., Wang, C.Y., Liao, H.Y.: YOLOv4: Optimal speed and accuracy of object detection. arXiv:2004.10934 (2020)
Cai, Z., Vasconcelos, N.: Cascade R-CNN: delving into high quality object detection. In: CVPR (2018)
Google Scholar
Dai, J., Li, Y., He, K., et al.: R-FCN: object detection via region-based fully convolutional networks. In: NeurIPS (2016)
Google Scholar
Duan, K., Bai, S., Xie, L., et al.: CenterNet: Keypoint triplets for object detection. arXiv:1904.08189 (2019)
Fu, C., Liu, W., Ranga, A., et al.: DSSD: Deconvolutional single shot detector. arXiv:1701.06659 (2017)
Girshick, R.B.: Fast R-CNN. In: ICCV (2015)
Google Scholar
He, K., Gkioxari, G., Dollár, P., et al.: Mask R-CNN. In: ICCV (2017)
Google Scholar
He, K., Zhang, X., Ren, S., et al.: Deep residual learning for image recognition. In: CVPR (2016)
Google Scholar
Huang, E., Korf, R.E.: New improvements in optimal rectangle packing. In: IJCAI (2009)
Google Scholar
Iandola, F., Moskewicz, M., Karayev, S., et al.: DenseNet: Implementing efficient convnet descriptor pyramids. arXiv:1404.1869 (2014)
Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. In: ICML (2015)
Google Scholar
Korf, R.E.: Optimal rectangle packing: initial results. In: ICAPS (2003)
Google Scholar
Li, Y., Chen, Y., Wang, N., et al.: Scale-aware trident networks for object detection. In: ICCV (2019)
Google Scholar
Li, Z., Peng, C., Yu, G., et al.: Light-head R-CNN: In defense of two-stage object detector. arXiv:1711.07264 (2017)
Lin, T., Dollár, P., Girshick, R.B., et al.: Feature pyramid networks for object detection. In: CVPR (2017)
Google Scholar
Lin, T., Goyal, P., Girshick, R.B., et al.: Focal loss for dense object detection. In: ICCV (2017)
Google Scholar
Liu, T.-Y., et al.: Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 740–755. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10602-1_48
Chapter Google Scholar
Liu, W., et al.: SSD: single shot multibox detector. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9905, pp. 21–37. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46448-0_2
Chapter Google Scholar
Newell, A., Yang, K., Deng, J.: Stacked hourglass networks for human pose estimation. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9912, pp. 483–499. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46484-8_29
Chapter Google Scholar
Pang, J., Chen, K., Shi, J., et al.: Libra R-CNN: towards balanced learning for object detection. In: CVPR (2019)
Google Scholar
Papandreou, G., Kokkinos, I., Savalle, P.A.: Untangling local and global deformations in deep convolutional networks for image classification and sliding window detection. arXiv:1412.0296 (2014)
Peng, C., Xiao, T., Li, Z., et al.: MegDet: a large mini-batch object detector. In: CVPR (2018)
Google Scholar
Redmon, J., Divvala, S.K., Girshick, R.B., et al.: You only look once: unified, real-time object detection. In: CVPR (2016)
Google Scholar
Redmon, J., Farhadi, A.: YOLO9000: better, faster, stronger. In: CVPR (2017)
Google Scholar
Redmon, J., Farhadi, A.: YOLOv3: An incremental improvement. arXiv:1804.02767 (2018)
Ren, S., He, K., Girshick, R.B., et al.: Faster R-CNN: towards real-time object detection with region proposal networks. In: NeurIPS (2015)
Google Scholar
Rezatofighi, H., Tsoi, N., Gwak, J., et al.: Generalized intersection over union: a metric and a loss for bounding box regression. In: CVPR (2019)
Google Scholar
Sandler, M., Howard, A.G., Zhu, M., et al.: MobileNetV2: inverted residuals and linear bottlenecks. In: CVPR (2018)
Google Scholar
Tan, M., Pang, R., Le, Q.V.: EfficientDet: scalable and efficient object detection. arXiv:1911.09070 (2019)
Tian, Z., Shen, C., Chen, H., et al.: FCOS: Fully convolutional one-stage object detection. arXiv:1904.01355 (2019)
Wang, N., Gao, Y., Chen, H., et al.: NAS-FCOS: Fast neural architecture search for object detection. arXiv:1906.04423 (2019)
Wei, C., Xie, L., Ren, X., et al.: Iterative reorganization with weak spatial constraints: solving arbitrary Jigsaw puzzles for unsupervised representation learning. In: CVPR (2019)
Google Scholar
Wu, B., Dai, X., Zhang, P., et al.: FBNet: hardware-aware efficient ConvNet design via differentiable neural architecture search. In: CVPR (2019)
Google Scholar
Wu, Y., He, K.: Group normalization. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11217, pp. 3–19. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01261-8_1
Chapter Google Scholar
Xie, S., Girshick, R.B., Dollár, P., et al.: Aggregated residual transformations for deep neural networks. In: CVPR (2017)
Google Scholar
Yang, T., Zhang, X., Li, Z., et al.: MetaAnchor: learning to detect objects with customized anchors. In: NeurIPS (2018)
Google Scholar
Yang, Z., Liu, S., Hu, H., et al.: RepPoints: Point set representation for object detection. arXiv:1904.11490 (2019)
Zhang, S., Wen, L., Bian, X., et al.: Single-shot refinement neural network for object detection. In: CVPR (2018)
Google Scholar
Zhang, Z., He, T., Zhang, H., et al.: Bag of freebies for training object detection neural networks. arXiv:1902.04103 (2019)
Zhao, Q., Sheng, T., Wang, Y., et al.: M2Det: a single-shot object detector based on multi-level feature pyramid network. In: AAAI (2019)
Google Scholar
Zhong, Y., Wang, J., Peng, J., et al.: Anchor box optimization for object detection. arXiv:1812.00469 (2018)
Zhu, C., Chen, F., Shen, Z., et al.: Soft anchor-point object detection. arXiv, arXiv:1911.12448 (2019)
Zhu, C., He, Y., Savvides, M.: Feature selective anchor-free module for single-shot object detection. In: CVPR (2019)
Google Scholar
Zhu, X., Hu, H., Lin, S., et al.: Deformable ConvNets v2: more deformable, better results. In: CVPR (2019)
Google Scholar

Download references

Acknowledgement

This research was financially supported by National Natural Science Foundation of China (61731022, 91646207) and the Strategic Priority Research Program of the Chinese Academy of Sciences (XDA19090300). We would like to thank Rui Yang and Chaoyi Liu from EvaVisdom Tech for the inspiring discussions. We also thank the anonymous reviewers for their valuable suggestions.

Author information

Authors and Affiliations

Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing, China
Kun Ding & Guojin He
National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences, Beijing, China
Huxiang Gu, Zisha Zhong, Shiming Xiang & Chunhong Pan
Beijing EvaVisdom Tech, Beijing, China
Huxiang Gu

Authors

Kun Ding
View author publications
You can also search for this author in PubMed Google Scholar
Guojin He
View author publications
You can also search for this author in PubMed Google Scholar
Huxiang Gu
View author publications
You can also search for this author in PubMed Google Scholar
Zisha Zhong
View author publications
You can also search for this author in PubMed Google Scholar
Shiming Xiang
View author publications
You can also search for this author in PubMed Google Scholar
Chunhong Pan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Kun Ding .

Editor information

Editors and Affiliations

University of Oxford, Oxford, UK
Andrea Vedaldi
Graz University of Technology, Graz, Austria
Horst Bischof
University of Freiburg, Freiburg im Breisgau, Germany
Thomas Brox
University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
Jan-Michael Frahm

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 18546 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ding, K., He, G., Gu, H., Zhong, Z., Xiang, S., Pan, C. (2020). PackDet: Packed Long-Head Object Detector. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, JM. (eds) Computer Vision – ECCV 2020. ECCV 2020. Lecture Notes in Computer Science(), vol 12358. Springer, Cham. https://doi.org/10.1007/978-3-030-58601-0_11

Download citation

DOI: https://doi.org/10.1007/978-3-030-58601-0_11
Published: 28 November 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-58600-3
Online ISBN: 978-3-030-58601-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics