Abstract
Running a reliable network on resource-limited platforms for a low-resolution image is a great challenge for heatmap-based human pose estimation (HPE). Scale mismatch between the input image and heatmaps and the intrinsic quantization effect induced by the ‘argmax’ function hinder the performance of heatmap-based human pose estimation for low-resolution image. In this paper, we propose a coordinate-decoupled and offset-revised module (CDORM) to tackle these challenges. The proposed CDORM uses two coordinate-decoupled 1-D heatmaps to supervise the regression process of determining the horizontal and vertical locations of human joints, and employs offset regressing to alleviate the effect of quantization. The CDORM can be integrated with any current heatmap-based HPE network without increasing the size of network significantly. Experimental results on the COCO and MPII datasets show that CDORM helps heatmap-based regression approaches obtain high estimation accuracy from the low-resolution image and only slightly increases the size and runtime of the network.
This is a preview of subscription content, access via your institution.



Code Availability
The code will be pulished online soon.
References
Andriluka M, Pishchulin L, Gehler P et al (2014) 2d human pose estimation: New benchmark and state of the art analysis. In: Proceedings of the IEEE conference on computer vision and pattern recognition. https://doi.org/10.1109/CVPR.2014.471, pp 3686–3693
Bhatti U, Huang M, Wang H et al (2018) Recommendation system for immunization coverage and monitoring. Hum Vaccines Immunother 14(1):165–171. https://doi.org/10.1080/21645515.2017.1379639https://doi.org/10.1080/21645515.2017.1379639
Bhatti U, Huang M, Wu D, et al. (2019) Recommendation system using feature extraction and pattern recognition in clinical care systems. Enterp Inf Syst 13(3):329–351. https://doi.org/10.1080/17517575.2018.1557256
Bhatti U, Yu Z, Chanussot J et al (2021) Local similarity-based spatial-spectral fusion hyperspectral image classification with deep CNN and Gabor filtering. IEEE Trans Geosci Remote Sens 60:1–15. https://doi.org/10.1109/TGRS.2021.3090410
Bhatti U, Yu Z, Hasnain A et al (2022) Evaluating the impact of roads on the diversity pattern and density of trees to improve the conservation of species. Environ Sci Pollut Res 29(10):14780–14790. https://doi.org/10.1007/s11356-021-16627-y
Bhatti U, Zeeshan Z, Nizamani M et al (2022) Assessing the change of ambient air quality patterns in Jiangsu Province of China pre-to post-COVID-19. Chemosphere 288:132569. https://doi.org/10.1016/j.chemosphere.2021.132569
Carreira J, Agrawal P, Fragkiadaki K et al (2016) Human pose estimation with iterative error feedback. In: Proceedings of the IEEE conference on computer vision and pattern recognition. https://doi.org/10.1109/CVPR.2016.512, pp 4733–4742
Chen Y, Wang Z, Peng Y, et al. (2018) Cascaded pyramid network for multi-person pose estimation. In: Proceedings of the IEEE conference on computer vision and pattern recognition. https://doi.org/10.1109/CVPR.2018.00742, pp 7103–7112
Cheng B, Xiao B, Wang J et al (2020) Higherhrnet: scale-aware representation learning for bottom-up human pose estimation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. https://doi.org/10.1109/CVPR42600.2020.00543, pp 5386–5395
Dai X, Chen Y, Xiao B et al (2021) Dynamic head: unifying object detection heads with attentions. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. https://doi.org/10.1109/CVPR46437.2021.00729, pp 7373–7382
Fan X, Zheng K, Lin Y, et al. (2015) Combining local appearance and holistic view: dual-source deep neural networks for human pose estimation. In: Proceedings of the IEEE conference on computer vision and pattern recognition. https://doi.org/10.1109/CVPR.2015.7298740, pp 1347–1355
Fang H, Xie S, Tai Y et al (2017) Rmpe: regional multi-person pose estimation. In: Proceedings of the IEEE international conference on computer vision. https://doi.org/10.1109/ICCV.2017.256, pp 2334–2343
Feng Z, Lai J, Xie X (2021) Resolution-aware knowledge distillation for efficient inference. IEEE Trans Image Process 30:6985–6996. https://doi.org/10.1109/TIP.2021.3101158
Girshick R (2015) Fast r-cnn. In: Proceedings of the IEEE international conference on computer vision, pp 1440–1448. https://doi.org/10.1109/ICCV.2015.169
Guesdon R, Crispim-Junior C, Tougne L (2021) Dripe: a dataset for human pose estimation in real-world driving settings. In: Proceedings of the IEEE/CVF international conference on computer vision. https://doi.org/10.1109/ICCVW54120.2021.00321, pp 2865–2874
He K, Zhang X, Ren S et al (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition. https://doi.org/10.1109/CVPR.2016.90https://doi.org/10.1109/CVPR.2016.90, pp 770–778
Li W, Wang Z, Yin B et al (2019) Rethinking on multi-stage networks for human pose estimation. arXiv:1901.00148
Li K, Wang S, Zhang X, et al. (2021) Pose recognition with cascade transformers. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. https://doi.org/10.1109/CVPR46437.2021.00198, pp 1944–1953
Li Y, Yang S, Zhang S et al (2021) Is 2D Heatmap representation even necessary for human pose estimation? arXiv:2107.03332
Lin T, Goyal P, Girshick R et al (2017) Focal loss for dense object detection. In: Proceedings of the IEEE international conference on computer vision. https://doi.org/10.1109/TPAMI.2018.2858826https://doi.org/10.1109/TPAMI.2018.2858826 , pp 2980–2988
Lin T, Maire M, Belongie S, et al. (2014) Microsoft coco: common objects in context. In: European conference on computer vision. https://doi.org/10.1109/CVPR.2014.471, pp 740–755
Martinez J, Black M, Romero J (2017) On human motion prediction using recurrent neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition. https://doi.org/10.1109/CVPR.2017.497, pp 4674–4683
Meng Q, Zhao S, Huang Z et al (2021) Magface: a universal representation for face recognition and quality assessment. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. https://doi.org/10.1109/CVPR46437.2021.01400, pp 14225–14234
Newell A, Yang K, Deng J (2016) Stacked hourglass networks for human pose estimation. In: European conference on computer vision. https://doi.org/10.1007/978-3-319-46484-8_29, pp 483–499
Nibali A, He Z, Morgan S et al (2018) Numerical coordinate regression with convolutional neural networks. arXiv:1801.07372
Nie X, Feng J, Zhang J, et al. (2019) Single-stage multi-person pose machines. In: Proceedings of the IEEE/CVF international conference on computer vision. https://doi.org/10.1109/ICCV.2019.00705https://doi.org/10.1109/ICCV.2019.00705, pp 6951–6960
Ren S, He K, Girshick R et al (2015) Faster r-cnn: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems, 28. https://doi.org/10.48550/arXiv.1506.01497https://doi.org/10.48550/arXiv.1506.01497
Sun X, Shang J, Liang S, et al. (2017) Compositional human pose regression. In: Proceedings of the IEEE international conference on computer vision. https://doi.org/10.1109/ICCV.2017.284, pp 2602–2611
Sun K, Xiao B, Liu D et al (2019) Deep high-resolution representation learning for human pose estimation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. https://doi.org/10.1109/CVPR.2019.00584, pp 5693–5703
Sun X, Xiao B, Wei F et al (2018) Integral human pose regression. In: Proceedings of the European conference on computer vision. https://doi.org/10.1007/978-3-030-01231-1_33, pp 529–545
Tian Z, Chen H, Shen C (2019) Directpose: direct end-to-end multi-person pose estimation. arXiv:1911.07451
Tian Z, Shen C, Chen H et al (2019) Fcos: fully convolutional one-stage object detection. In: Proceedings of the IEEE/CVF international conference on computer vision. https://doi.org/10.1109/ICCV.2019.00972, pp 9627–9636
Tian L, Wang P, Liang G et al (2021) An adversarial human pose estimation network injected with graph structure. Pattern Recogn 115:107863. https://doi.org/10.1016/j.patcog.2021.107863
Tompson J, Jain A, LeCun Y et al (2014) Joint training of a convolutional network and a graphical model for human pose estimation. Advances in Neural Information Processing Systems, 27. https://doi.org/10.48550/arXiv.1406.2984
Toshev A, Szegedy C (2014) Deeppose: human pose estimation via deep neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition. https://doi.org/10.1109/CVPR.2014.214, pp 1653–1660
Vaswani A, Shazeer N, Parmar N et al (2017) Attention is all you need. Advances in Neural Information Processing Systems, 30. https://doi.org/10.48550/arXiv.1706.03762
Wang C, Zhang F, Ge S (2021) A comprehensive survey on 2D multi-person pose estimation methods. Eng Appl Artif Intel 102:104260. https://doi.org/10.1016/j.engappai.2021.104260
Wei F, Sun X, Li H et al (2020) Point-set anchors for object detection, instance segmentation and pose estimation. In: European conference on computer vision. https://doi.org/10.1007/978-3-030-58607-2_31, pp 527–544
Xiao B, Wu H, Wei Y (2018) Simple baselines for human pose estimation and tracking. In: Proceedings of the European conference on computer vision. https://doi.org/10.1007/978-3-030-01231-1_29, pp 466–481
Yu C, Xiao B, Gao C et al (2021) Lite-hrnet: a lightweight high-resolution network. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. https://doi.org/10.1109/CVPR46437.2021.01030, pp 10440–10450
Zhang F, Zhu X, Dai H et al (2020) Distribution-aware coordinate representation for human pose estimation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. https://doi.org/10.1109/CVPR42600.2020.00712, pp 7093–7102
Zhang R, Zhu Z, Li P et al (2019) Exploiting offset-guided network for pose estimation and tracking. In: CVPR Workshops. https://doi.org/10.48550/arXiv.1906.01344
Zheng L, Huang Y, Lu H et al (2019) Pose-invariant embedding for deep person re-identification. IEEE Trans Image Process 28(9):4500–4509. https://doi.org/10.1109/TIP.2019.2910414
Zhou L, Chen Y, Gao Y et al (2020) Occlusion-aware siamese network for human pose estimation. In: European conference on computer vision. https://doi.org/10.1007/978-3-030-58565-5_24, pp 396–412
Acknowledgments
This work was supported by National Natural Science Foundation of China (62173353), Guangzhou Municipal People’s Livelihood Science and Technology Plan (201903010040), Science and Technology Program of Guangzhou, China (202007030011).
Funding
This work was supported by National Natural Science Foundation of China (62173353), Guangzhou Municipal People’s Livelihood Science and Technology Plan (201903010040), Science and Technology Program of Guangzhou, China (202007030011).
Author information
Authors and Affiliations
Contributions
Cailong Chi: Investigation, Methodology, Data Acquisition & Analysis, Visulization, Writing Original Draft. Dong Zhang: Funding Acquisition, Conceptualization, Data Analysis, Critically Revised, Data Curation. Zhesi Zhu: Methodology, Validation. Xingzhi Wang: Methodology. Dah-Jye Lee: Conceptualization, Critically Revised.
Corresponding author
Ethics declarations
Ethics approval and consent to participate
Not applicable.
Consent for Publication
All authors agreed with the content and gave explicit consent to submit.
Conflict of Interests
The authors declare that they have no conflict of interest.
Additional information
Availability of data and materials
The datasets analysed during the current study are available in the Common Objects in Context (COCO) repository, https://cocodataset.org/, and MPII Human Pose Dataset, http://human-pose.mpi-inf.mpg.de/. The data generated during the current study are available from the corresponding author on reasonable request.
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Chi, C., Zhang, D., Zhu, Z. et al. Human pose estimation for low-resolution image using 1-D heatmaps and offset regression. Multimed Tools Appl 82, 6289–6307 (2023). https://doi.org/10.1007/s11042-022-13468-w
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-022-13468-w
Keywords
- Human pose estimation
- Heatmap-based regression
- 1-D heatmap
- Offset regression