Abstract
Human and head detection have been rapidly improved with the development of deep convolutional neural networks. However, these two detection tasks are often studied separately, without taking advantage of the relationship between human and head. In this paper, we present a new two-stage detection framework, namely Joint Enhancement Detection (JED), to simultaneously detect human and head based on enhanced features. Specifically, the proposed JED contains two newly added modules, i.e., the Body Enhancement Module (BEM) and the Head Enhancement Module (HEM). The former is designed to enhance the features used for human detection, while the latter aims to enhance the features used for head detection. With these enhanced features in a joint framework, the proposed method is able to detect human and head simultaneously and efficiently. We verify the effectiveness of the proposed method on the CrowdHuman dataset and achieve better performance than baseline method for both human and head detection.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Redmon, J., Divvala, S., Girshick, R.: You only look once: unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR (2016)
Liu, W., et al.: SSD: single shot multibox detector. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9905, pp. 21–37. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46448-0_2
Lin, T., Goyal, P., Girshick, R.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, ICCV (2017)
Zhang, S., Wen, L., Bian, X., Lei, Z., Li, S.Z.: Single-shot refinement neural network for object detection. In: CVPR (2018)
Girshick, R., Donahue, J., Darrell, T.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR (2014)
He, K., Zhang, X., Ren, S., Sun, J.: Spatial pyramid pooling in deep convolutional networks for visual recognition. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8691, pp. 346–361. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10578-9_23
Girshick, R.: Fast R-CNN. In: IEEE International Conference on Computer Vision, ICCV (2015)
Ren, S., He, K., Girshick, R.: Faster R-CNN: towards real-time object detection with region proposal networks. In: TPAMI (2017)
Dai, J., Li, Y., He, K.: R-FCN: object detection via region-based fully convolutional networks. In: Advances in Neural Information Processing Systems, NIPS (2016)
Lin, T.Y., Dollár, P., Girshick, R., He, K., Hariharan, B., Belongie, S.: Feature pyramid networks for object detection. In: CVPR (2017)
Wang, X., Xiao, T., Jiang, Y., Shao, S., Sun, J., Shen, C.: Repulsion loss: detecting pedestrians in a crowd. In: CVPR (2018)
Zhou, C., Yuan, J.: Bi-box regression for pedestrian detection and occlusion estimation. In: ECCV (2018)
Zhang, S., Wen, L., Bian, X., Lei, Z., Li, S.Z.: Occlusion-aware R-CNN: detecting pedestrians in a crowd. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11207, pp. 657–674. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01219-9_39
Vu, T., Osokin, A., Laptev, I.: Context-aware CNNs for person head detection. In: ICCV (2015)
Stewart, R., Andriluka, M., Ng, A.Y.: End-to-end people detection in crowded scenes. In: CVPR (2016)
Chen, G., Cai, X., Han, H., Shan, S., Chen, X.: Headnet: pedestrian head detection utilizing body in context. In: FG (2018)
Cai, Z., Fan, Q., Feris, R.S., Vasconcelos, N.: A unified multi-scale deep convolutional neural network for fast object detection. arXiv preprint arXiv:1607.07155 (2016)
Mao, J., Xiao, T., Jiang, Y., Cao, Z.: What can help pedestrian detection?. In: CVPR (2017)
Zhou, C., Yuan, J.: Multi-label learning of part detectors for heavily occluded pedestrian detection. In: ICCV (2017)
Zhang, S., Yang, J., Schiele, B.: Occluded pedestrian detection through guided attention in CNNs. In: CVPR (2018)
Merad, D., Aziz, K., Thome, N.: Fast people counting using head detection from skeleton graph. In: AVSS (2010)
Venkatesh, B.S., Descamps, A., Carincotte, C.: Counting people in the crowd using a generic head detector. In: AVSS (2012)
Sun, Z., Peng, D., Cai, Z., Chen, Z., Jin, L.: Scale mapping and dynamic re-detecting in dense head detection. In: ICIP (2018)
Le, C., Ma, H., Wang, X., Li, X.: Key parts context and scene geometry in human head detection. In: ICIP (2018)
Shao, S., Zhao, Z., Li, B.: CrowdHuman: a benchmark for detecting human in a crowd. arXiv preprint arXiv:1805.00123 (2018)
Zhang, S., Benenson, R., Schiele, B.: Citypersons: a diverse dataset for pedestrian detection. In: CVPR (2017)
Dollár, P., Wojek, C., Schiele, B., Perona, P.: Pedestrian detection: a benchmark. In: CVPR (2009)
Paszke, A., Gross, S., Chintala, S., Chanan, G.: Pytorch (2017)
Acknowledgements
This work was supported by the Chinese National Natural Science Foundation Projects #61876178, #61806196, #61872367, #61572501.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Zhang, Y., Zhang, S., Zhuang, C., Lei, Z. (2019). Feature Enhancement for Joint Human and Head Detection. In: Sun, Z., He, R., Feng, J., Shan, S., Guo, Z. (eds) Biometric Recognition. CCBR 2019. Lecture Notes in Computer Science(), vol 11818. Springer, Cham. https://doi.org/10.1007/978-3-030-31456-9_56
Download citation
DOI: https://doi.org/10.1007/978-3-030-31456-9_56
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-31455-2
Online ISBN: 978-3-030-31456-9
eBook Packages: Computer ScienceComputer Science (R0)