Skip to main content
SpringerLink
Log in
Menu
Find a journal Publish with us
Search
Go to cart
  1. Home
  2. Computational Visual Media
  3. Article
EfficientPose: Efficient human pose estimation with neural architecture search
Download PDF
Your article has downloaded

Similar articles being viewed by others

Slider with three articles shown per slide. Use the Previous and Next buttons to navigate the slides or the slide controller buttons at the end to navigate through each slide.

Enhanced 3D Human Pose Estimation from Videos by Using Attention-Based Neural Network with Dilated Convolutions

26 February 2021

Ruixu Liu, Ju Shen, … Vijayan K. Asari

2D Human pose estimation: a survey

11 November 2022

Haoming Chen, Runyang Feng, … Zhenguang Liu

Human pose estimation based on lightweight basicblock

13 November 2022

Yanping Li, Ruyi Liu, … Rui Wang

EfficientPose: Scalable single-person pose estimation

06 November 2020

Daniel Groos, Heri Ramampiaro & Espen AF Ihlen

Lightweight densely connected residual network for human pose estimation

09 October 2020

Lianping Yang, Yu Qin & Xiangde Zhang

Learning a Robust Part-Aware Monocular 3D Human Pose Estimator via Neural Architecture Search

26 October 2021

Zerui Chen, Yan Huang, … Liang Wang

Lightweight human pose estimation: CVC-net

07 March 2022

Xiaofei Qin, Haiyang Guo, … Xuedian Zhang

Fixed-resolution representation network for human pose estimation

01 April 2022

Yongxiang Liu & Xiaorong Hou

MTPose: Human Pose Estimation with High-Resolution Multi-scale Transformers

29 March 2022

Rui Wang, Fudi Geng & Xiangyang Wang

Download PDF
  • Research Article
  • Open Access
  • Published: 07 April 2021

EfficientPose: Efficient human pose estimation with neural architecture search

  • Wenqiang Zhang1 na1,
  • Jiemin Fang2,1 na1,
  • Xinggang Wang1 &
  • …
  • Wenyu Liu1 

Computational Visual Media volume 7, pages 335–347 (2021)Cite this article

  • 1144 Accesses

  • 19 Citations

  • Metrics details

Abstract

Human pose estimation from image and video is a key task in many multimedia applications. Previous methods achieve great performance but rarely take efficiency into consideration, which makes it difficult to implement the networks on lightweight devices. Nowadays, real-time multimedia applications call for more efficient models for better interaction. Moreover, most deep neural networks for pose estimation directly reuse networks designed for image classification as the backbone, which are not optimized for the pose estimation task. In this paper, we propose an efficient framework for human pose estimation with two parts, an efficient backbone and an efficient head. By implementing a differentiable neural architecture search method, we customize the backbone network design for pose estimation, and reduce computational cost with negligible accuracy degradation. For the efficient head, we slim the transposed convolutions and propose a spatial information correction module to promote the performance of the final prediction. In experiments, we evaluate our networks on the MPII and COCO datasets. Our smallest model requires only 0.65 GFLOPs with 88.1% PCKh@0.5 on MPII and our large model needs only 2 GFLOPs while its accuracy is competitive with the state-of-the-art large model, HRNet, which takes 9.5 GFLOPs.

Download to read the full article text

Working on a manuscript?

Avoid the common mistakes

References

  1. Yang, Y.; Ramanan, D. Articulated pose estimation with flexible mixtures-of-parts. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 1385–1392, 2011.

  2. Pishchulin, L.; Andriluka, M.; Gehler, P.; Schiele, B. Poselet conditioned pictorial structures. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 588–595, 2013.

  3. Toshev, A.; Szegedy, C. DeepPose: Human pose estimation via deep neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 1653–1660, 2014.

  4. Newell, A.; Yang, K. Y.; Deng, J. Stacked hourglass networks for human pose estimation. In: Computer Vision — ECCV 2016. Lecture Notes in Computer Science, Vol. 9912. Leibe, B.; Matas, J.; Sebe, N.; Welling, M. Eds. Springer Cham, 483–499, 2016.

  5. Xiao, B.; Wu, H. P.; Wei, Y. C. Simple baselines for human pose estimation and tracking. In: Computer Vision — ECCV 2018. Lecture Notes in Computer Science, Vol. 11210. Ferrari, V.; Hebert, M.; Sminchisescu, C.; Weiss, Y. Eds. Springer Cham, 472–487, 2018.

  6. Sun, K.; Xiao, B.; Liu, D.; Wang, J. D. Deep high-resolution representation learning for human pose estimation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 5686–5696, 2019.

  7. Andriluka, M.; Pishchulin, L.; Gehler, P.; Schiele, B. 2D human pose estimation: New benchmark and state of the art analysis. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 3686–3693, 2014.

  8. Lin, T. Y.; Maire, M.; Belongie, S.; Hays, J.; Perona, P.; Ramanan, D.; Dollár, P.; Zitnick, C. L. Microsoft COCO: Common objects in context. In: Computer Vision — ECCV 2014. Lecture Notes in Computer Science, Vol. 8693. Fleet, D.; Pajdla, T.; Schiele, B.; Tuytelaars, T. Eds. Springer Cham, 740–755, 2014.

  9. Chen, Y. L.; Wang, Z. C.; Peng, Y. X.; Zhang, Z. Q.; Yu, G.; Sun, J. Cascaded pyramid network for multi-person pose estimation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 7103–7112, 2018.

  10. Li, W. B.; Wang, Z. C.; Yin, B. Y.; Peng, Q. X.; Su, J. Rethinking on multi-stage networks for human pose estimation. arXiv preprint arXiv:1901.00148, 2019.

  11. He, K. M.; Zhang, X. Y.; Ren, S. Q.; Sun, J. Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 770–778, 2016.

  12. Howard, A. G.; Zhu, M. L.; Chen, B.; Kalenichenko, D.; Adam, H. Mobilenets: Efficient convolutional neural networks formobile vision applications. arXiv preprint arXiv:1704.04861, 2017.

  13. Zoph, B.; Vasudevan, V.; Shlens, J.; Le, Q. V. Learning transferable architectures for scalable image recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 8697–8710, 2018.

  14. Real, E.; Aggarwal, A.; Huang, Y. P.; Le, Q. V. Regularized evolution for image classifier architecture search. In: Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 33, 4780–4789, 2019.

  15. Bender, G.; Kindermans, P.; Zoph, B.; Vasudevan, V.; Le, Q. Understanding and simplifying one-shot architecture search. In: Proceedings of the 35th International Conference on Machine Learning, 549–558, 2018.

  16. Liu, H. X.; Simonyan, K.; Yang, Y. M. DARTS: Differentiable architecture search. In: Proceedings of the 7th International Conference on Learning Representations, 2019.

  17. Cai, H.; Zhu, L.; Han, S. ProxylessNAS: Direct neural architecture search on target task and hardware. In: Proceedings of the International Conference on Learning Representations, 2019.

  18. Wu, B.; Dai, X.; Zhang, P.; Wang, Y.; Sun, F.; Wu, Y.; Tian, Y.; Vajda, P.; Jia, Y.; Keutzer, K. Fbnet: Hardware-aware efficient convNet design via differentiable neural architecture search. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 10726–10734, 2019.

  19. Liu, C. X.; Chen, L. C.; Schroff, F.; Adam, H.; Hua, W.; Yuille, A. L.; Fei-Fei, L. Auto-DeepLab: Hierarchical neural architecture search for semantic image segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 82–92, 2019.

  20. Zhang, Y.; Qiu, Z.; Liu, J.; Yao, T.; Liu, D.; Mei, T. Customizable architecture search forsemantic segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 11633–11642, 2019.

  21. Ghiasi, G.; Lin, T. Y.; Le, Q. V. NAS-FPN: Learning scalable feature pyramid architecture for object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 7029–7038, 2019.

  22. Fang, J. M.; Sun, Y. Z.; Zhang, Q.; Peng, K. J.; Wang, X. G. FNA++: Fast network adaptation via parameter remapping and architecture search. In: Proceedings of the International Conference on Learning Representations, 2020.

  23. Yang, W.; Li, S.; Ouyang, W. L.; Li, H. S.; Wang, X. G. Learning feature pyramids for human pose estimation. In: Proceedings of the IEEE International Conference on Computer Vision, 1290–1299, 2017.

  24. Bulat, A.; Tzimiropoulos, G. Binarized convolutional landmark localizers for human pose estimation and face alignment with limited resources. In: Proceedings of the IEEE International Conference on Computer Vision, 3726–3734, 2017.

  25. Tang, Z. Q.; Peng, X.; Geng, S. J.; Wu, L. F.; Zhang, S. T.; Metaxas, D. Quantized densely connected U-nets for efficient landmark localization. In: Computer Vision — ECCV 2018. Lecture Notes in Computer Science, Vol. 11207. Ferrari, V.; Hebert, M.; Sminchisescu, C.; Weiss, Y. Eds. Springer Cham, 348–364, 2018.

  26. Zhang, F.; Zhu, X. T.; Ye, M. Fast human pose estimation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 3512–3521, 2019.

  27. Wei, S. H.; Ramakrishna, V.; Kanade, T.; Sheikh, Y. Convolutional pose machines. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 4724–4732, 2016.

  28. Odena, A.; Dumoulin, V.; Olah, C. Deconvolution and checkerboard artifacts. Distill, 2016. Available at https://doi.org/10.23915/distill.

  29. Gao, H.; Yuan, H.; Wang, Z.; Ji, S. Pixel deconvolutional networks. arXiv preprint arXiv:1705.06820, 2017.

  30. Wojna, Z.; Uijlings, J.; Guadarrama, S.; Silberman, N.; Chen, L. C.; Fathi, A.; Uijlings, J. The devil is in the decoder. In: Proceedings of the British Machine Vision Conference, 10.1–10.13, 2017.

  31. Sugawara, Y.; Shiota, S.; Kiya, H. Checkerboard artifacts free convolutional neural networks. APSIPA Transactions on Signal and Information Processing Vol. 8, e9, 2019.

    Article  Google Scholar 

  32. Tan, M. X.; Le, Q. V. EfficientNet: Rethinking model scaling for convolutional neural networks. arXiv preprint arXiv:1905.11946, 2019.

  33. Brock, A.; Lim, T.; Ritchie, J. M.; Weston, N. SMASH: One-shot model architecture search through HyperNetworks. In: Proceedings of the International Conference on Learning Representations, 2018.

  34. Dong, X. Y.; Yang, Y. Searching for a robust neural architecture in four GPU hours. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 1761–1770, 2019.

  35. Xu, Y. H.; Xie, L. X.; Zhang, X. P.; Chen, X.; Xiong, H. K. PC-DARTS: Partial channel connections for memory-efficient differentiable architecture search. In: Proceedings of the International Conference on Learning Representations, 2019.

  36. Tan, M. X.; Chen, B.; Pang, R. M.; Vasudevan, V.; Sandler, M.; Howard, A.; Le, Q. V. MnasNet: Platform-aware neural architecture search for mobile. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2815–2823, 2019.

  37. Gong, X. Y.; Chen, W. Y.; Jiang, Y. F.; Yuan, Y.; Wang, Z. Y. AutoPose: Searching multi-scale branch aggregation for pose estimation. arXiv preprint arXiv:2008.07018, 2020.

  38. Sandler, M.; Howard, A.; Zhu, M. L.; Zhmoginov, A.; Chen, L. C. MobileNetV2: Inverted residuals and linear bottlenecks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 4510–4520, 2018.

  39. Fang, J. M.; Sun, Y. Z.; Zhang, Q.; Li, Y.; Wang, X. G. Densely connected search space for more flexible neural architecture search, In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 10625–10634, 2020.

  40. Tang, W.; Yu, P.; Wu, Y. Deeply learned compositional models for human pose estimation. In: Computer Vision — ECCV 2018. Lecture Notes in Computer Science, Vol. 11207. Ferrari, V.; Hebert, M.; Sminchisescu, C.; Weiss, Y. Eds. Springer Cham, 197–214, 2018.

  41. Yang, S.; Yang, W. K.; Cui, Z. Pose neural fabrics search. arXiv preprint arXiv:1909.07068, 2019.

  42. Zhang, Z.; Tang, J.; Wu, G. Simple and lightweight human pose estimation. arXiv preprint arXiv:1911.10346, 2019.

  43. He, K. M.; Gkioxari, G.; Dollár, P.; Girshick, R. Mask R-CNN. In: Proceedings of the IEEE International Conference on Computer Vision, 2980–2988, 2017.

  44. Papandreou, G.; Zhu, T.; Kanazawa, N.; Toshev, A.; Tompson, J.; Bregler, C.; Murphy, K. Towards accurate multi-person pose estimation in the wild. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 3711–3719, 2017.

  45. Huang, S. L.; Gong, M. M.; Tao, D. C. A coarse-fine network for keypoint localization. In: Proceedings of the IEEE International Conference on Computer Vision, 3047–3056, 2017.

  46. Ottelander, T. D.; Dushatskiy, A.; Virgolin, M.; Bosman, P. A. N. Local search is a remarkably strong baseline for neural architecture search. arXiv preprint arXiv:2004.08996, 2020.

Download references

Acknowledgements

This work was in part supported by National Natural Science Foundation of China (NSFC) (Nos. 61733007 and 61876212) and Zhejiang Lab (No. 2019NB0AB02).

Author information

Author notes
  1. Wenqian Zhang and Jiemin Fang contributed equally to this work.

Authors and Affiliations

  1. School of EIC, Huazhong University of Science and Technology, Wuhan, 430074, China

    Wenqiang Zhang, Jiemin Fang, Xinggang Wang & Wenyu Liu

  2. Institute of Artificial Intelligence, Huazhong University of Science and Technology, Wuhan, 430074, China

    Jiemin Fang

Authors
  1. Wenqiang Zhang
    View author publications

    You can also search for this author in PubMed Google Scholar

  2. Jiemin Fang
    View author publications

    You can also search for this author in PubMed Google Scholar

  3. Xinggang Wang
    View author publications

    You can also search for this author in PubMed Google Scholar

  4. Wenyu Liu
    View author publications

    You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xinggang Wang.

Additional information

Wenqiang Zhang is a master student in the School of Electronic Information and Communications, Huazhong University of Science and Technology, Wuhan. His research interests include pose estimation and neural architecture search.

Jiemin Fang received his B.E. degree from the School of Electronic Information and Communications, Huazhong University of Science and Technology in 2018. He is currently a Ph.D. candidate at the Institute of Artificial Intelligence and School of Electronic Information and Communications, Huazhong University of Science and Technology. His research interests include AutoML and efficient deep learning.

Xinggang Wang received his B.S. and Ph.D. degrees in electronics and information engineering from Huazhong University of Science and Technology, in 2009 and 2014, respectively. He is currently an associate professor with the School of Electronic Information and Communications, HUST. His research interests include computer vision and machine learning.

Wenyu Liu received his B.S. degree in computer science from Tsinghua University, Beijing, China, in 1986, and his M.S. and Ph.D. degrees, both in electronics and information engineering, from Huazhong University of Science and Technology (HUST), in 1991 and 2001, respectively. He is now a professor and associate dean of the School of Electronic Information and Communications, HUST. His current research areas include computer vision, multimedia, and machine learning.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made.

The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Other papers from this open access journal are available free of charge from http://www.springer.com/journal/41095. To submit a manuscript, please go to https://www.editorial-manager.com/cvmj.

Reprints and Permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhang, W., Fang, J., Wang, X. et al. EfficientPose: Efficient human pose estimation with neural architecture search. Comp. Visual Media 7, 335–347 (2021). https://doi.org/10.1007/s41095-021-0214-z

Download citation

  • Received: 11 December 2020

  • Accepted: 16 February 2021

  • Published: 07 April 2021

  • Issue Date: September 2021

  • DOI: https://doi.org/10.1007/s41095-021-0214-z

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

Keywords

  • pose estimation
  • neural architecture search
  • efficient deep learning
Download PDF

Working on a manuscript?

Avoid the common mistakes

Advertisement

Search

Navigation

  • Find a journal
  • Publish with us

Discover content

  • Journals A-Z
  • Books A-Z

Publish with us

  • Publish your research
  • Open access publishing

Products and services

  • Our products
  • Librarians
  • Societies
  • Partners and advertisers

Our imprints

  • Springer
  • Nature Portfolio
  • BMC
  • Palgrave Macmillan
  • Apress
  • Your US state privacy rights
  • Accessibility statement
  • Terms and conditions
  • Privacy policy
  • Help and support

Not affiliated

Springer Nature

© 2023 Springer Nature