Skip to main content
Log in

FaSRnet: a feature and semantics refinement network for human pose estimation

FaSRnet: 用于人体姿态估计的特征和语义修正网络

  • Research Article
  • Published:
Frontiers of Information Technology & Electronic Engineering Aims and scope Submit manuscript

Abstract

Due to factors such as motion blur, video out-of-focus, and occlusion, multi-frame human pose estimation is a challenging task. Exploiting temporal consistency between consecutive frames is an efficient approach for addressing this issue. Currently, most methods explore temporal consistency through refinements of the final heatmaps. The heatmaps contain the semantics information of key points, and can improve the detection quality to a certain extent. However, they are generated by features, and feature-level refinements are rarely considered. In this paper, we propose a human pose estimation framework with refinements at the feature and semantics levels. We align auxiliary features with the features of the current frame to reduce the loss caused by different feature distributions. An attention mechanism is then used to fuse auxiliary features with current features. In terms of semantics, we use the difference information between adjacent heatmaps as auxiliary features to refine the current heatmaps. The method is validated on the large-scale benchmark datasets PoseTrack2017 and PoseTrack2018, and the results demonstrate the effectiveness of our method.

摘要

由于运动模糊、视频失焦和遮挡等因素, 多帧人体姿态估计是一项有挑战性的任务。利用连续帧之间的时间一致性是解决这一问题的有效方法。目前, 大多数方法通过修正最终热图来利用时间一致性。热图包含了关键点的语义信息, 可在一定程度上提高检测质量。它们由特征生成, 但这些方法很少考虑特征级别的修正。本文提出一种人体姿态估计框架, 该框架在特征和语义层面进行了改进。将辅助特征与当前帧的特征对齐, 以减少不同特征分布带来的损失。使用注意力机制将辅助特征与当前特征融合。在语义方面, 使用相邻热图之间的差异作为辅助特征来修正当前热图。在大型基准数据集PoseTrack2017和PoseTrack2018上验证了该方法的有效性。

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Data availability

The code is available at https://github.com/Elvis-Aron/FaSRnet. The other data that support the findings of this study are available from the corresponding author upon reasonable request.

References

Download references

Author information

Authors and Affiliations

Authors

Contributions

Yuanhong ZHONG designed the research. Qianfeng XU and Daidi ZHONG processed the data. Yuanhong ZHONG, Qianfeng XU, and Daidi ZHONG drafted the paper. Xun YANG and Shanshan WANG helped organize the paper. All the authors revised and finalized the paper.

Corresponding author

Correspondence to Yuanhong Zhong  (仲元红).

Ethics declarations

All the authors declare that they have no conflict of interest.

Additional information

Project supported by the National Key Research and Development Program of China (Nos. 2021YFC2009200 and 2023YFC3606100) and the Special Project of Technological Innovation and Application Development of Chongqing, China (No. cstc2019jscx-msxmX0167)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhong, Y., Xu, Q., Zhong, D. et al. FaSRnet: a feature and semantics refinement network for human pose estimation. Front Inform Technol Electron Eng 25, 513–526 (2024). https://doi.org/10.1631/FITEE.2200639

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1631/FITEE.2200639

Key words

关键词

CLC number

Navigation