Skip to main content

Advertisement

Log in

Combining fractal hourglass network and skeleton joints pairwise affinity for multi-person pose estimation

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Human pose estimation, especially multi-person pose estimation, is vital for understanding human abnormal behavior. In this paper, we develop a fractal hourglass model to automatically regress human body joints, and propose a layered double-way inference algorithm to calculate the affinity between neighboring skeleton joints. Firstly, the original hourglass resident unit was replaced and the candidate skeleton joints location heatmap regression process was described. And then, we determine the specific body joints location and optimize the regression results. Next, the double-way conditional probabilities between adjacent joints is defined as joints pairwise affinity, and is applied to match adjacent human body part. What’s more, we adopt the spatial distance constraint to refine body joints matching result. Finally, we connect the best matching joints-pair, and iterate the process until all candidate joints are assigned into individual. Extensive experiments on the MPII multi-person subset and the COCO 2016 keypoints challenge show the effectiveness of our method, outperforming the second best method (Associative Embedding) by 0.45 and 1.20%.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

References

  1. Andriluka M, Pishchulin L, Gehler P, Schiele B (2014) 2D human pose estimation: new benchmark and state of the art analysis. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3686–3693

  2. Belagiannis V, Zisserman A (2017) Recurrent human pose estimation. In: Proceedings of the IEEE International Conference on Automatic Face & Gesture Recognition, pp 468–475

  3. Cao Z, Simon T, Wei SE, Sheikh Y (2016) Realtime multi-person 2D pose estimation using part affinity fields. arXiv:1611.08050

  4. Carreira J, Agrawal P, Fragkiadaki K, Malik J (2016) Human pose estimation with iterative error feedback. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4733–4742

  5. Chen X, Yuille A (2014) Articulated pose estimation by a graphical model with image dependent pairwise relations. In: Proceedings of Advances in Neural Information Processing Systems, pp 1736–1744

  6. Chu X, Yang W, Ouyang WL, Ma C, Yuille AL, Wang XG (2017) Multi-context attention for human pose estimation. arXiv:1702.07432

  7. COCO Dataset. http://cocodataset.org/#keypoints-eval

  8. Collobert R, Kavukcuoglu K, Farabet C (2011) Torch7: a matlab-like environment for machine learning. In: Proceedings of Advances in Neural Information Processing Systems

  9. Fan X, Zheng K, Lin Y, Wang S (2015) Combining local appearance and holistic view: dual-source deep neural networks for human pose estimation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1347–1355

  10. Fang HS, Xie SQ, Tai YW, Lu CW (2016) RMPE: regional multi-person pose estimation. arXiv: 1612.00137

  11. Geng Y, Liang RZ, Li W, Wang J, Liang G, Xu C, Wang J (2016) Learning convolutional neural network to maximize pos@top performance measure. In: European Symposium on Artificial Neural Networks (ESANN), pp 589–594

  12. Geng Y, Zhang G, Li W, Gu Y, Liang RZ, Liang G, Wang J, Wu Y, Patil N, Wang JY (2017) A novel image tag completion method based on convolutional neural transformation. In: International Conference on Artificial Neural Networks, pp 539–546

  13. Guo Y, Tao D, Yu J, Xiong H, Li Y, Tao D (2016) Deep neural networks with relativity learning for facial expression recognition. In: IEEE International Conference on Multimedia & Expo Workshops (ICMEW), pp 1–6

  14. He KM, Zhang XY, Ren SQ, Sun J (2015) Deep residual learning for image recognition. arXiv:1512.03385

  15. He K, Gkioxari G, Dollár P, Girshick R (2017) Mask R-CNN. arXiv: 1703.06870

  16. Insafutdinov E, Andriluka M, Pishchulin L, Tang S, Levinkov E, Andres B, Schiele B (2016) ArtTrack: articulated multi-person tracking in the wild. arXiv: 1612.01465

  17. Insafutdinov E, Pishchulin L, Andres B, Andriluka M, Schiele B (2016) DeeperCut: a deeper, stronger, and faster multi-person pose estimation model. In: European Conference on Computer Vision, pp 34–50

    Chapter  Google Scholar 

  18. Iqbal U, Gall J (2016) Multi-person pose estimation with local joint-to-person associations. In: European Conference on Computer Vision, pp 627–642

    Google Scholar 

  19. Jain A, Tompson J, Andriluka M, Taylor GW, Bregler C (2013) Learning human pose estimation features with convolutional networks. Comput Sci

  20. Ke SR, Zhu LJ, Hwang JN, Pai HI, Lan KM, Liao CP (2010) Real-time 3D human pose estimation from monocular view with applications to event detection and video gaming. In: Proceedings of Seventh IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS), pp 489–496

  21. Ke SR, Hwang JN, Lan KM, Wang SZ (2011) View-invariant 3D human body pose reconstruction using a monocular video camera. In: Fifth ACM/IEEE International Conference on Distributed Smart Cameras (ICDSC), pp 1–6

  22. Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. Adv Neural Inf Proces Syst 25(2):1097–1105

    Google Scholar 

  23. Lin TY, Maire M, Belongie S, Bourdev L, Girshick R, Hays J, Perona P, Ramanan D, Zitnick CL, Dollár P (2014) Microsoft COCO: common objects in context. In: European Conference on Computer Vision, pp 740–755

    Google Scholar 

  24. Loffe S, Szegedy C (2015) Batch normalization: accelerating deep network training by reducing internal covariate shift. arXiv:1502.03167

  25. Neubeck A, Gool LV (2006) Efficient non-maximum suppression. In: International Conference on Pattern Recognition, pp 850–855

  26. Newell A, Yang KY, Deng J (2016) Stacked hourglass networks for human pose estimation. In: European Conference on Computer Vision, pp 483–499

    Chapter  Google Scholar 

  27. Newell A, Huang Z, Deng J (2016) Associative embedding: end-to-end learning for joint detection and grouping. arXiv: 1611.05424

  28. Pan Z, Liu S, Fu W (2017) A review of visual moving target tracking. Multimed Tools Appl 76(16):16989–17018

    Article  Google Scholar 

  29. Papandreou G, Zhu T, Kanazawa N, Toshev A, Tompson J, Bregler C, Murphy K (2017) Towards accurate multi-person pose estimation in the wild. arXiv:1701.01779

  30. Pishchulin L, Insafutdinov E, Tang S, Andres B, Andriluka M, Gehler P, Schiele B (2016) DeepCut: joint subset partition and labeling for multi person pose estimation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4929–4937

  31. Popoola OP, Wang K (2012) Video-based abnormal human behavior recognition—a review. IEEE Trans on System Man & Cybern 42(6):865–878

    Article  Google Scholar 

  32. Ren S, He K, Girshick R, Sun J (2015) Faster R-CNN: towards real-time object detection with region proposal networks. In: Advances in Neural Information Processing Systems, pp 91–99

  33. Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. Comput Sci

  34. Tao D, Cheng J, Song M, Lin X (2016) Manifold ranking-based matrix factorization for saliency detection. IEEE Transactions on Neural Networks and Learning Systems (TNNLS) 27(6):1122–1134

    Article  MathSciNet  Google Scholar 

  35. Tao D, Guo Y, Yu B, Pang J, Yu Z (2017) Deep multi-view feature learning for person re-identification. IEEE Trans Circuits Syst Video Technol (TCSVT) PP(99):1–1

  36. Tieleman T, Hinton G (2017) Lecture 6.5-rmsprop: divide the gradient by a running average of its recent magnitude. In COURSERA: Neural Networks for Machine Learning, 4(2)

  37. Tompson J, Jain A, Lecun Y, Bregler C (2014) Joint training of a convolutional network and a graphical model for human pose estimation. In: Proceedings of Advances in Neural Information Processing Systems, pp 1799–1807

  38. Toshev A, Szegedy C (2013) DeepPose: human pose estimation via deep neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1653–1660

  39. Wang C, Wang Y, Yuille AL (2013) An approach to pose-based action recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 915–922

  40. Wang H, Dan O, Verbeek J, Schmid C (2016) A robust and efficient video representation for action recognition. Int J Comput Vis 119(3):219–238

    Article  MathSciNet  Google Scholar 

  41. Xiao T, Li H, Ouyang W, Wang X (2016) Learning deep feature representations with domain guided dropout for person re-identification. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1249–1258

  42. Yang Y, Ramanan D (2013) Articulated human detection with flexible mixtures of parts. IEEE Trans Pattern Anal Mach Intell 35(12):2878–2890

    Article  Google Scholar 

  43. Yuan Y, Fang J, Wang Q (2015) Online anomaly detection in crowd scenes via structure analysis. IEEE Trans on Cybernetics 45(3):548–561

    Article  Google Scholar 

  44. Zhang G, Liang G, Li W, Fang J, Wang J, Geng Y, Wang JY (2017) Learning convolutional ranking-score function by query preference regularization. In: International Conference on Intelligent Data Engineering and Automated Learning, pp 1–8

    Google Scholar 

Download references

Acknowledgements

We would like to gratitude the authors of the MPII human pose dataset and the team members of the COCO 2016 Keypoint Challenges. At the same time, we also thank our laboratory member’s assistance.

Funding

This work was supported by the grants from National Natural Science Foundation of China (Grant No. 61605048), the Talent project of Huaqiao University (Grant No. 14BS215), and Quanzhou scientific and technological planning projects of Fujian, China (Grant No. 2015Z120).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zhitong Xu.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Luo, Y., Xu, Z., Liu, P. et al. Combining fractal hourglass network and skeleton joints pairwise affinity for multi-person pose estimation. Multimed Tools Appl 78, 7341–7363 (2019). https://doi.org/10.1007/s11042-018-6502-7

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-018-6502-7

Keywords

Navigation