Advertisement

Feature context learning for human parsing

  • Tengteng Huang
  • Yongchao XuEmail author
  • Song Bai
  • Yongpan Wang
  • Xiang Bai
Research Paper
  • 16 Downloads

Abstract

Parsing inconsistency, referring to the scatters and speckles in the parsing results as well as imprecise contours, is a long-standing problem in human parsing. It results from the fact that the pixel-wise classification loss independently considers each pixel. To address the inconsistency issue, we propose in this paper an end-to-end trainable, highly flexible and generic module called feature context module (FCM). FCM explores the correlation of adjacent pixels and aggregates the contextual information embedded in the real topology of the human body. Therefore, the feature representations are enhanced and thus quite robust in distinguishing semantically related parts. Extensive experiments are done with three different backbone models and four benchmark datasets, suggesting that FCM can be an effective and efficient plug-in to consistently improve the performance of existing algorithms without sacrificing the inference speed too much.

Keywords

human parsing context learning fully convolutional networks graph convolutional network semantic segmentation 

Notes

Acknowledgements

This work was supported in part by National Key Research and Development Program of China (Grant No. 2018YFB1004600), National Natural Science Foundation of China (Grant No. 61703171), and Natural Science Foundation of Hubei Province of China (Grant No. 2018CFB199). This work was also supported by Alibaba Group through Alibaba Innovative Research (AIR) Program. The work of Yongchao XU was supported by Young Elite Scientists Sponsorship Program by CAST. The work of Xiang BAI was supported by National Program for Support of Top-Notch Young Professionals and in part by Program for HUST Academic Frontier Youth Team.

References

  1. 1.
    Gan C, Lin M, Yang Y, et al. Concepts not alone: exploring pairwise relationships for zero-shot video activity recognition. In: Proceedings of the AAAI Conference on Artificial Intelligence, 2016. 3487–3493Google Scholar
  2. 2.
    Han X, Wu Z X, Wu Z, et al. Viton: an image-based virtual try-on network. In: Proceedings of IEEE International Conference on Computer Vision and Pattern Recognition, 2018. 7543–7552Google Scholar
  3. 3.
    Kalayeh M M, Basaran E, Gökmen M, et al. Human semantic parsing for person re-identification. In: Proceedings of IEEE International Conference on Computer Vision and Pattern Recognition, 2018. 1062–1071Google Scholar
  4. 4.
    Long J, Shelhamer E, Darrell T. Fully convolutional networks for semantic segmentation. In: Proceedings of IEEE International Conference on Computer Vision and Pattern Recognition, 2015. 3431–3440Google Scholar
  5. 5.
    Zhao H S, Shi J P, Qi X J, et al. Pyramid scene parsing network. In: Proceedings of IEEE International Conference on Computer Vision and Pattern Recognition, 2017. 2881–2890Google Scholar
  6. 6.
    Zhou Y Y, Wang Y, Tang P, et al. Semi-supervised 3D abdominal multi-organ segmentation via deep multi-planar co-training. In: Proceedings of 2019 IEEE Winter Conference on Applications of Computer Vision (WACV), 2019. 121–140Google Scholar
  7. 7.
    Luo Y W, Zheng Z D, Zheng L, et al. Macro-micro adversarial network for human parsing. In: Proceedings of European Conference on Computer Vision, 2018. 418–434Google Scholar
  8. 8.
    Chen L C, Papandreou G, Kokkinos I, et al. DeepLab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. IEEE Trans Pattern Anal Mach Intell, 2018, 40: 834–848CrossRefGoogle Scholar
  9. 9.
    Nie X C, Feng J S, Yan S C. Mutual learning to adapt for joint human parsing and pose estimation. In: Proceedings of European Conference on Computer Vision, 2018. 502–517Google Scholar
  10. 10.
    Gong K, Liang X D, Li Y C, et al. Instance-level human parsing via part grouping network. In: Proceedings of European Conference on Computer Vision, 2018. 770–785Google Scholar
  11. 11.
    Liu T, Ruan T, Huang Z, et al. Devil in the details: towards accurate single and multiple human parsing. 2018. ArXiv: 1809.05996Google Scholar
  12. 12.
    Kipf T N, Welling M. Semi-supervised classification with graph convolutional networks. 2016. ArXiv: 1609.02907Google Scholar
  13. 13.
    Veličković P, Cucurull G, Casanova A, et al. Graph attention networks. 2017. ArXiv: 1710.10903Google Scholar
  14. 14.
    Xia F T, Wang P, Chen X J, et al. Joint multi-person pose estimation and semantic part segmentation. In: Proceedings of IEEE International Conference on Computer Vision and Pattern Recognition, 2017. 6769–6778Google Scholar
  15. 15.
    Fang H-S, Lu G S, Fang X L, et al. Weakly and semi supervised human body part parsing via pose-guided knowledge transfer. In: Proceedings of IEEE International Conference on Computer Vision and Pattern Recognition, 2018Google Scholar
  16. 16.
    Liu S, Sun Y, Zhu D F, et al. Cross-domain human parsing via adversarial feature and label adaptation. 2018. ArXiv: 1801.01260Google Scholar
  17. 17.
    Liang X, Gong K, Shen X, et al. Look into person: joint body parsing & pose estimation network and a new benchmark. IEEE Trans Pattern Anal Mach Intell, 2019, 41: 871–885CrossRefGoogle Scholar
  18. 18.
    Zhu B K, Chen Y Y, Tang M, et al. Progressive cognitive human parsing. In: Proceedings of the AAAI Conference on Artificial Intelligence, 2018Google Scholar
  19. 19.
    Guo L H, Guo C G, Li L, et al. Two-stage local constrained sparse coding for fine-grained visual categorization. Sci China Inf Sci, 2018, 61: 018104CrossRefGoogle Scholar
  20. 20.
    Sun H Q, Pang Y W. GlanceNets — efficient convolutional neural networks with adaptive hard example mining. Sci China Inf Sci, 2018, 61: 109101CrossRefGoogle Scholar
  21. 21.
    Xu Y, Wang Y, Zhou W, et al. TextField: learning a deep direction field for irregular scene text detection. IEEE Trans Image Process, 2019. doi:  https://doi.org/10.1109/TIP.2019.2900589 MathSciNetCrossRefGoogle Scholar
  22. 22.
    Krähenbühl P, Koltun V. Efficient inference in fully connected crfs with gaussian edge potentials. In: Proceedings of Advances in Neural Information Processing Systems, 2011Google Scholar
  23. 23.
    Ke T-W, Hwang J-J, Liu Z W, et al. Adaptive affinity field for semantic segmentation. In: Proceedings of 2018 European Conference on Computer Vision. Berlin: Springer, 2018. 605–621CrossRefGoogle Scholar
  24. 24.
    Goodfellow I, Pouget-Abadie J, Mirza M, et al. Generative adversarial nets. In: Proceedings of Advances in Neural Information Processing Systems, 2014Google Scholar
  25. 25.
    Gong K, Liang X D, Zhang D Y, et al. Look into person: self-supervised structure-sensitive learning and a new benchmark for human parsing. In: Proceedings of IEEE International Conference on Computer Vision and Pattern Recognition, 2017. 932–940Google Scholar
  26. 26.
    Jin J W, Liu Z L, Chen C L P. Discriminative graph regularized broad learning system for image recognition. Sci China Inf Sci, 2018, 61: 112209CrossRefGoogle Scholar
  27. 27.
    Liang X D, Lin L, Shen X H, et al. Interpretable structure-evolving LSTM. In: Proceedings of IEEE International Conference on Computer Vision and Pattern Recognition, 2017. 1010–1019Google Scholar
  28. 28.
    Zhang H, Dana K, Shi J P, et al. Context encoding for semantic segmentation. In: Proceedings of IEEE International Conference on Computer Vision and Pattern Recognition, 2018. 7151–7160Google Scholar
  29. 29.
    Wang X L, Girshick R, Gupta A, et al. Non-local neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018. 7794–7803Google Scholar
  30. 30.
    Huang Z, Wang X, Huang L, et al. Ccnet: criss-cross attention for semantic segmentation. 2018. ArXiv: 1811.11721Google Scholar
  31. 31.
    He K M, Zhang X Y, Ren S Q, et al. Deep residual learning for image recognition. In: Proceedings of IEEE International Conference on Computer Vision and Pattern Recognition, 2016. 770–778Google Scholar
  32. 32.
    Chen X J, Mottaghi R, Liu X B, et al. Detect what you can: detecting and representing objects using holistic models and body parts. In: Proceedings of IEEE International Conference on Computer Vision and Pattern Recognition, 2014. 1971–1978Google Scholar
  33. 33.
    Luo P, Wang X G, Tang X O. Pedestrian parsing via deep decompositional network. In: Proceedings of IEEE International Conference on Computer Vision, 2013. 2648–2655Google Scholar
  34. 34.
    Lin T-Y, Maire M, Belongie S, et al. Microsoft coco: common objects in context. In: Proceedings of European Conference on Computer Vision, 2014Google Scholar
  35. 35.
    Badrinarayanan V, Kendall A, Cipolla R. SegNet: a deep convolutional encoder-decoder architecture for image segmentation. IEEE Trans Pattern Anal Mach Intell, 2017, 39: 2481–2495CrossRefGoogle Scholar
  36. 36.
    Chen L-C, Yang Y, Wang J, et al. Attention to scale: scale-aware semantic image segmentation. In: Proceedings of IEEE International Conference on Computer Vision and Pattern Recognition, 2016. 3640–3649Google Scholar
  37. 37.
    Liang X, Shen X, Feng J, et al. Semantic object parsing with graph LSTM. In: Proceedings of European Conference on Computer Vision, 2016Google Scholar
  38. 38.
    Luc P, Couprie C, Chintala S, et al. Semantic segmentation using adversarial networks. In: Proceedings of NIPS Workshop, 2016Google Scholar

Copyright information

© Science China Press and Springer-Verlag GmbH Germany, part of Springer Nature 2019

Authors and Affiliations

  • Tengteng Huang
    • 1
  • Yongchao Xu
    • 1
    Email author
  • Song Bai
    • 1
  • Yongpan Wang
    • 2
  • Xiang Bai
    • 1
  1. 1.School of Electronic Information and CommunicationsHuazhong University of Science and TechnologyWuhanChina
  2. 2.Alibaba GroupHangzhouChina

Personalised recommendations