Adversarial Semantic Data Augmentation for Human Pose Estimation

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 12364)


Human pose estimation is the task of localizing body keypoints from still images. The state-of-the-art methods suffer from insufficient examples of challenging cases such as symmetric appearance, heavy occlusion and nearby person. To enlarge the amounts of challenging cases, previous methods augmented images by cropping and pasting image patches with weak semantics, which leads to unrealistic appearance and limited diversity. We instead propose Semantic Data Augmentation (SDA), a method that augments images by pasting segmented body parts with various semantic granularity. Furthermore, we propose Adversarial Semantic Data Augmentation (ASDA), which exploits a generative network to dynamically predict tailored pasting configuration. Given off-the-shelf pose estimation network as discriminator, the generator seeks the most confusing transformation to increase the loss of the discriminator while the discriminator takes the generated sample as input and learns from it. The whole pipeline is optimized in an adversarial manner. State-of-the-art results are achieved on challenging benchmarks. The code has been publicly available at


Pose estimation Semantic data augmentation 



This work was supported by the National Natural Science Foundation of China under grant 61871435 and the Fundamental Research Funds for the Central Universities no. 2019kfyXKJC024.


  1. 1.
    Andriluka, M., Pishchulin, L., Gehler, P., Schiele, B.: 2D human pose estimation: new benchmark and state of the art analysis. In: CVPR, pp. 3686–3693 (2014)Google Scholar
  2. 2.
    Bulat, A., Tzimiropoulos, G.: Human pose estimation via convolutional part heatmap regression. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9911, pp. 717–732. Springer, Cham (2016). Scholar
  3. 3.
    Chen, Y., Wang, Z., Peng, Y., Zhang, Z., Yu, G., Sun, J.: Cascaded pyramid network for multi-person pose estimation. In: CVPR, pp. 7103–7112 (2018)Google Scholar
  4. 4.
    Chen, Y., Shen, C., Wei, X.S., Liu, L., Yang, J.: Adversarial posenet: a structure-aware convolutional network for human pose estimation. In: ICCV, pp. 1212–1221 (2017)Google Scholar
  5. 5.
    Chu, W., Hung, W.C., Tsai, Y.H., Cai, D., Yang, M.H.: Weakly-supervised caricature face parsing through domain adaptation. ICIP (2019)Google Scholar
  6. 6.
    Chu, X., Yang, W., Ouyang, W., Ma, C., Yuille, A.L., Wang, X.: Multi-context attention for human pose estimation. In: CVPR, pp. 1831–1840 (2017)Google Scholar
  7. 7.
    Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: ImageNet: a large-scale hierarchical image database. In: CVPR, pp. 248–255 (2009)Google Scholar
  8. 8.
    Fieraru, M., Khoreva, A., Pishchulin, L., Schiele, B.: Learning to refine human pose estimation. In: CVPR Workshops, pp. 205–214 (2018)Google Scholar
  9. 9.
    Gong, K., Liang, X., Zhang, D., Shen, X., Lin, L.: Look into person: self-supervised structure-sensitive learning and a new benchmark for human parsing. In: CVPR, pp. 932–940 (2017)Google Scholar
  10. 10.
    Goodfellow, I., et al.: Generative adversarial nets. In: NIPS, pp. 2672–2680 (2014)Google Scholar
  11. 11.
    Insafutdinov, E., Pishchulin, L., Andres, B., Andriluka, M., Schiele, B.: DeeperCut: a deeper, stronger, and faster multi-person pose estimation model. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9910, pp. 34–50. Springer, Cham (2016). Scholar
  12. 12.
    Johnson, S., Everingham, M.: Clustered pose and nonlinear appearance models for human pose estimation. In: BMVC, vol. 2, p. 5 (2010)Google Scholar
  13. 13.
    Ke, L., Chang, M.-C., Qi, H., Lyu, S.: Multi-scale structure-aware network for human pose estimation. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11206, pp. 731–746. Springer, Cham (2018). Scholar
  14. 14.
    Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. ICLRGoogle Scholar
  15. 15.
    Li, W., et al.: Rethinking on multi-stage networks for human pose estimation. arXiv preprint arXiv:1901.00148 (2019)
  16. 16.
    Lin, T.-Y., et al.: Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 740–755. Springer, Cham (2014). Scholar
  17. 17.
    Liu, T., et al.: Devil in the details: towards accurate single and multiple human parsing. arXiv preprint arXiv:1809.05996 (2018)
  18. 18.
    Newell, A., Yang, K., Deng, J.: Stacked hourglass networks for human pose estimation. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9912, pp. 483–499. Springer, Cham (2016). Scholar
  19. 19.
    Nie, X., Feng, J., Zuo, Y., Yan, S.: Human pose estimation with parsing induced learner. In: CVPR (2018)Google Scholar
  20. 20.
    Ning, G., Zhang, Z., He, Z.: Knowledge-guided deep fractal neural networks for human pose estimation. IEEE Trans. Multimed. 20(5), 1246–1259 (2018)CrossRefGoogle Scholar
  21. 21.
    Peng, X., Tang, Z., Yang, F., Feris, R.S., Metaxas, D.: Jointly optimize data augmentation and network training: Adversarial data augmentation in human pose estimation. In: CVPR (2018)Google Scholar
  22. 22.
    Su, Z., Ye, M., Zhang, G., Dai, L., Sheng, J.: Cascade feature aggregation for human pose estimation. arXiv preprint arXiv:1902.07837 (2019)
  23. 23.
    Sun, K., Xiao, B., Liu, D., Wang, J.: Deep high-resolution representation learning for human pose estimation. arXiv preprint arXiv:1902.09212 (2019)
  24. 24.
    Tang, W., Wu, Y.: Does learning specific features for related parts help human pose estimation? In: CVPR, pp. 1107–1116 (2019)Google Scholar
  25. 25.
    Tang, W., Yu, P., Wu, Y.: Deeply learned compositional models for human pose estimation. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11207, pp. 197–214. Springer, Cham (2018). Scholar
  26. 26.
    Tompson, J.J., Jain, A., LeCun, Y., Bregler, C.: Joint training of a convolutional network and a graphical model for human pose estimation. In: NIPS, pp. 1799–1807 (2014)Google Scholar
  27. 27.
    Toshev, A., Szegedy, C.: DeepPose: human pose estimation via deep neural networks. In: CVPR, pp. 1653–1660 (2014)Google Scholar
  28. 28.
    Wang, X., Shrivastava, A., Gupta, A.: A-fast-RCNN: hard positive generation via adversary for object detection. In: CVPR, pp. 2606–2615 (2017)Google Scholar
  29. 29.
    Wei, S.E., Ramakrishna, V., Kanade, T., Sheikh, Y.: Convolutional pose machines. In: CVPR. pp. 4724–4732 (2016)Google Scholar
  30. 30.
    Xiao, B., Wu, H., Wei, Y.: Simple baselines for human pose estimation and tracking. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11210, pp. 472–487. Springer, Cham (2018). Scholar
  31. 31.
    Yang, W., Li, S., Ouyang, W., Li, H., Wang, X.: Learning feature pyramids for human pose estimation. In: ICCV, pp. 1281–1290 (2017)Google Scholar
  32. 32.
    Yu, A., Grauman, K.: Semantic jitter: dense supervision for visual comparisons via synthetic images. In: ICCV, pp. 5570–5579 (2017)Google Scholar
  33. 33.
    Zhang, H., et al.: Human pose estimation with spatial contextual information. arXiv preprint arXiv:1901.01760 (2019)

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  1. 1.Key Laboratory of Image Processing and Intelligent Control, School of Artificial Intelligence and AutomationHuazhong University of Science and TechnologyWuhanChina
  2. 2.Tencent Youtu LabShanghaiChina

Personalised recommendations