Combining fractal hourglass network and skeleton joints pairwise affinity for multi-person pose estimation

Luo, Yanmin; Xu, Zhitong; Liu, Peizhong; Du, Yongzhao; Guo, Jingming

doi:10.1007/s11042-018-6502-7

Combining fractal hourglass network and skeleton joints pairwise affinity for multi-person pose estimation

Published: 10 August 2018

Volume 78, pages 7341–7363, (2019)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Yanmin Luo^1,2,
Zhitong Xu^1,2,
Peizhong Liu³,
Yongzhao Du³ &
…
Jingming Guo⁴

343 Accesses
2 Citations
Explore all metrics

Abstract

Human pose estimation, especially multi-person pose estimation, is vital for understanding human abnormal behavior. In this paper, we develop a fractal hourglass model to automatically regress human body joints, and propose a layered double-way inference algorithm to calculate the affinity between neighboring skeleton joints. Firstly, the original hourglass resident unit was replaced and the candidate skeleton joints location heatmap regression process was described. And then, we determine the specific body joints location and optimize the regression results. Next, the double-way conditional probabilities between adjacent joints is defined as joints pairwise affinity, and is applied to match adjacent human body part. What’s more, we adopt the spatial distance constraint to refine body joints matching result. Finally, we connect the best matching joints-pair, and iterate the process until all candidate joints are assigned into individual. Extensive experiments on the MPII multi-person subset and the COCO 2016 keypoints challenge show the effectiveness of our method, outperforming the second best method (Associative Embedding) by 0.45 and 1.20%.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

High-order local connection network for 3D human pose estimation based on GCN

Article 17 March 2022

Hierarchical pose net: spatial hierarchical body tree driven multi-person pose estimation

Article 29 May 2023

TSNet : Tree structure network for human pose estimation

Article 11 August 2021

References

Andriluka M, Pishchulin L, Gehler P, Schiele B (2014) 2D human pose estimation: new benchmark and state of the art analysis. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3686–3693
Belagiannis V, Zisserman A (2017) Recurrent human pose estimation. In: Proceedings of the IEEE International Conference on Automatic Face & Gesture Recognition, pp 468–475
Cao Z, Simon T, Wei SE, Sheikh Y (2016) Realtime multi-person 2D pose estimation using part affinity fields. arXiv:1611.08050
Carreira J, Agrawal P, Fragkiadaki K, Malik J (2016) Human pose estimation with iterative error feedback. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4733–4742
Chen X, Yuille A (2014) Articulated pose estimation by a graphical model with image dependent pairwise relations. In: Proceedings of Advances in Neural Information Processing Systems, pp 1736–1744
Chu X, Yang W, Ouyang WL, Ma C, Yuille AL, Wang XG (2017) Multi-context attention for human pose estimation. arXiv:1702.07432
COCO Dataset. http://cocodataset.org/#keypoints-eval
Collobert R, Kavukcuoglu K, Farabet C (2011) Torch7: a matlab-like environment for machine learning. In: Proceedings of Advances in Neural Information Processing Systems
Fan X, Zheng K, Lin Y, Wang S (2015) Combining local appearance and holistic view: dual-source deep neural networks for human pose estimation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1347–1355
Fang HS, Xie SQ, Tai YW, Lu CW (2016) RMPE: regional multi-person pose estimation. arXiv: 1612.00137
Geng Y, Liang RZ, Li W, Wang J, Liang G, Xu C, Wang J (2016) Learning convolutional neural network to maximize pos@top performance measure. In: European Symposium on Artificial Neural Networks (ESANN), pp 589–594
Geng Y, Zhang G, Li W, Gu Y, Liang RZ, Liang G, Wang J, Wu Y, Patil N, Wang JY (2017) A novel image tag completion method based on convolutional neural transformation. In: International Conference on Artificial Neural Networks, pp 539–546
Guo Y, Tao D, Yu J, Xiong H, Li Y, Tao D (2016) Deep neural networks with relativity learning for facial expression recognition. In: IEEE International Conference on Multimedia & Expo Workshops (ICMEW), pp 1–6
He KM, Zhang XY, Ren SQ, Sun J (2015) Deep residual learning for image recognition. arXiv:1512.03385
He K, Gkioxari G, Dollár P, Girshick R (2017) Mask R-CNN. arXiv: 1703.06870
Insafutdinov E, Andriluka M, Pishchulin L, Tang S, Levinkov E, Andres B, Schiele B (2016) ArtTrack: articulated multi-person tracking in the wild. arXiv: 1612.01465
Insafutdinov E, Pishchulin L, Andres B, Andriluka M, Schiele B (2016) DeeperCut: a deeper, stronger, and faster multi-person pose estimation model. In: European Conference on Computer Vision, pp 34–50
Chapter Google Scholar
Iqbal U, Gall J (2016) Multi-person pose estimation with local joint-to-person associations. In: European Conference on Computer Vision, pp 627–642
Google Scholar
Jain A, Tompson J, Andriluka M, Taylor GW, Bregler C (2013) Learning human pose estimation features with convolutional networks. Comput Sci
Ke SR, Zhu LJ, Hwang JN, Pai HI, Lan KM, Liao CP (2010) Real-time 3D human pose estimation from monocular view with applications to event detection and video gaming. In: Proceedings of Seventh IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS), pp 489–496
Ke SR, Hwang JN, Lan KM, Wang SZ (2011) View-invariant 3D human body pose reconstruction using a monocular video camera. In: Fifth ACM/IEEE International Conference on Distributed Smart Cameras (ICDSC), pp 1–6
Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. Adv Neural Inf Proces Syst 25(2):1097–1105
Google Scholar
Lin TY, Maire M, Belongie S, Bourdev L, Girshick R, Hays J, Perona P, Ramanan D, Zitnick CL, Dollár P (2014) Microsoft COCO: common objects in context. In: European Conference on Computer Vision, pp 740–755
Google Scholar
Loffe S, Szegedy C (2015) Batch normalization: accelerating deep network training by reducing internal covariate shift. arXiv:1502.03167
Neubeck A, Gool LV (2006) Efficient non-maximum suppression. In: International Conference on Pattern Recognition, pp 850–855
Newell A, Yang KY, Deng J (2016) Stacked hourglass networks for human pose estimation. In: European Conference on Computer Vision, pp 483–499
Chapter Google Scholar
Newell A, Huang Z, Deng J (2016) Associative embedding: end-to-end learning for joint detection and grouping. arXiv: 1611.05424
Pan Z, Liu S, Fu W (2017) A review of visual moving target tracking. Multimed Tools Appl 76(16):16989–17018
Article Google Scholar
Papandreou G, Zhu T, Kanazawa N, Toshev A, Tompson J, Bregler C, Murphy K (2017) Towards accurate multi-person pose estimation in the wild. arXiv:1701.01779
Pishchulin L, Insafutdinov E, Tang S, Andres B, Andriluka M, Gehler P, Schiele B (2016) DeepCut: joint subset partition and labeling for multi person pose estimation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4929–4937
Popoola OP, Wang K (2012) Video-based abnormal human behavior recognition—a review. IEEE Trans on System Man & Cybern 42(6):865–878
Article Google Scholar
Ren S, He K, Girshick R, Sun J (2015) Faster R-CNN: towards real-time object detection with region proposal networks. In: Advances in Neural Information Processing Systems, pp 91–99
Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. Comput Sci
Tao D, Cheng J, Song M, Lin X (2016) Manifold ranking-based matrix factorization for saliency detection. IEEE Transactions on Neural Networks and Learning Systems (TNNLS) 27(6):1122–1134
Article MathSciNet Google Scholar
Tao D, Guo Y, Yu B, Pang J, Yu Z (2017) Deep multi-view feature learning for person re-identification. IEEE Trans Circuits Syst Video Technol (TCSVT) PP(99):1–1
Tieleman T, Hinton G (2017) Lecture 6.5-rmsprop: divide the gradient by a running average of its recent magnitude. In COURSERA: Neural Networks for Machine Learning, 4(2)
Tompson J, Jain A, Lecun Y, Bregler C (2014) Joint training of a convolutional network and a graphical model for human pose estimation. In: Proceedings of Advances in Neural Information Processing Systems, pp 1799–1807
Toshev A, Szegedy C (2013) DeepPose: human pose estimation via deep neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1653–1660
Wang C, Wang Y, Yuille AL (2013) An approach to pose-based action recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 915–922
Wang H, Dan O, Verbeek J, Schmid C (2016) A robust and efficient video representation for action recognition. Int J Comput Vis 119(3):219–238
Article MathSciNet Google Scholar
Xiao T, Li H, Ouyang W, Wang X (2016) Learning deep feature representations with domain guided dropout for person re-identification. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1249–1258
Yang Y, Ramanan D (2013) Articulated human detection with flexible mixtures of parts. IEEE Trans Pattern Anal Mach Intell 35(12):2878–2890
Article Google Scholar
Yuan Y, Fang J, Wang Q (2015) Online anomaly detection in crowd scenes via structure analysis. IEEE Trans on Cybernetics 45(3):548–561
Article Google Scholar
Zhang G, Liang G, Li W, Fang J, Wang J, Geng Y, Wang JY (2017) Learning convolutional ranking-score function by query preference regularization. In: International Conference on Intelligent Data Engineering and Automated Learning, pp 1–8
Google Scholar

Download references

Acknowledgements

We would like to gratitude the authors of the MPII human pose dataset and the team members of the COCO 2016 Keypoint Challenges. At the same time, we also thank our laboratory member’s assistance.

Funding

This work was supported by the grants from National Natural Science Foundation of China (Grant No. 61605048), the Talent project of Huaqiao University (Grant No. 14BS215), and Quanzhou scientific and technological planning projects of Fujian, China (Grant No. 2015Z120).

Author information

Authors and Affiliations

College of Computer Science and Technology, Huaqiao University, No. 668, Jimei Avenue, Xiamen, 361021, China
Yanmin Luo & Zhitong Xu
Key Laboratory for Computer Vision and Pattern Recognition of Xiamen City, Huaqiao University, No. 668, Jimei Avenue, Xiamen, 361021, China
Yanmin Luo & Zhitong Xu
College of Engineering, Huaqiao University, No. 269, Chenghua North Road, Quanzhou, 362021, China
Peizhong Liu & Yongzhao Du
Department of Electrical Engineering, National Taiwan University of Science and Technology, No. 43, Keelung Road, Da’an District, Taipei, 10607, Taiwan
Jingming Guo

Authors

Yanmin Luo
View author publications
You can also search for this author in PubMed Google Scholar
Zhitong Xu
View author publications
You can also search for this author in PubMed Google Scholar
Peizhong Liu
View author publications
You can also search for this author in PubMed Google Scholar
Yongzhao Du
View author publications
You can also search for this author in PubMed Google Scholar
Jingming Guo
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Zhitong Xu.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Luo, Y., Xu, Z., Liu, P. et al. Combining fractal hourglass network and skeleton joints pairwise affinity for multi-person pose estimation. Multimed Tools Appl 78, 7341–7363 (2019). https://doi.org/10.1007/s11042-018-6502-7

Download citation

Received: 25 December 2017
Revised: 23 May 2018
Accepted: 02 August 2018
Published: 10 August 2018
Issue Date: March 2019
DOI: https://doi.org/10.1007/s11042-018-6502-7

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Combining fractal hourglass network and skeleton joints pairwise affinity for multi-person pose estimation

Abstract

Access this article

Similar content being viewed by others

High-order local connection network for 3D human pose estimation based on GCN

Hierarchical pose net: spatial hierarchical body tree driven multi-person pose estimation

TSNet : Tree structure network for human pose estimation

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Combining fractal hourglass network and skeleton joints pairwise affinity for multi-person pose estimation

Abstract

Access this article

Similar content being viewed by others

High-order local connection network for 3D human pose estimation based on GCN

Hierarchical pose net: spatial hierarchical body tree driven multi-person pose estimation

TSNet : Tree structure network for human pose estimation

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation