Graph-Based Scale-Aware Network for Human Parsing
- 1.6k Downloads
Abstract
Recent work has made considerable progress in exploring contextual information for human parsing with the Fully Convolutional Network framework. However, there still exist two challenges: (1) inherent relative relationships between parts; (2) scale variation of human parts. To tackle both problems, we propose a Graph-Based Scale-Aware Network for human parsing. First, we embed a Graph-Based Part Reasoning Layer into the backbone network to reason the relative relationship between human parts. Then we construct a Scale-Aware Context Embedding Layer, which consists of two branches to capture scale-specific contextual information, with different receptive fields and scale-specific supervisions. In addition, we adopt an edge supervision to further improve the performance. Extensive experimental evaluations demonstrate that the proposed model performs favorably against the state-of-the-art human parsing methods. More specifically, our algorithm achieves 53.32% (mIoU) on the LIP dataset.
Keywords
Human parsing Segmentation Graph-based reasoning Scale-aware embeddingNotes
Acknowledgements
This work was supported by the Project of the National Natural Science Foundation of China (No. 61876210), and Natural Science Foundation of Hubei Province (No. 2018CFB426).
References
- 1.Badrinarayanan, V., Kendall, A., Cipolla, R.: SegNet: a deep convolutional encoder-decoder architecture for image segmentation. TPAMI 39(12), 2481–2495 (2017)CrossRefGoogle Scholar
- 2.Buades, A., Coll, B., Morel, J.-M.: A non-local algorithm for image denoising. In: CVPR (2005)Google Scholar
- 3.Chen, L.-C., Barron, J.T., Papandreou, G., Murphy, K., Yuille, A.L.: Semantic image segmentation with task-specific edge detection using CNNs and a discriminatively trained domain transform. In: CVPR (2016)Google Scholar
- 4.Chen, L.-C., Papandreou, G., Kokkinos, I., Murphy, K., Yuille, A.L.: Semantic image segmentation with deep convolutional nets and fully connected CRFs. In: ICLR (2015)Google Scholar
- 5.Chen, L.-C., Papandreou, G., Kokkinos, I., Murphy, K., Yuille, A.L.: DeepLab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. TPAMI 40(4), 834–848 (2017)CrossRefGoogle Scholar
- 6.Chen, L.-C., Yang, Y., Wang, J., Xu, W., Yuille, A.L.: Attention to scale: scale-aware semantic image segmentation. In CVPR (2016)Google Scholar
- 7.Chen, Y., Rohrbach, M., Yan, Z., Yan, S., Kalantidis, Y.: Graph-based global reasoning networks. In: CVPR (2019)Google Scholar
- 8.Fu, J., Zheng, H., Mei, T.: Look closer to see better: recurrent attention convolutional neural network for fine-grained image recognition. In: CVPR (2017)Google Scholar
- 9.Fu, J., Liu, J., Tian, H., Fang, Z., Lu, H.: Dual attention network for scene segmentation. In: CVPR (2019)Google Scholar
- 10.Gan, C., Lin, M., Yang, Y., de Melo, G., Hauptmann, A.G.: Concepts not alone: exploring pairwise relationships for zero-shot video activity recognition. AAAI Press (2016)Google Scholar
- 11.Gong, K., Liang, X., Zhang, D., Shen, X., Lin, L.: Look into person: self-supervised structure-sensitive learning and a new benchmark for human parsing. In: CVPR (2017)Google Scholar
- 12.He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR (2016)Google Scholar
- 13.Jégou, S., Drozdzal, M., Vazquez, D., Romero, A., Bengio, Y.: The one hundred layers tiramisu: fully convolutional densenets for semantic segmentation. In: CVPR (2017)Google Scholar
- 14.Liang, X., Gong, K., Shen, X., Lin, L.: Look into person: joint body parsing & pose estimation network and a new benchmark. TPAMI 41(4), 871–885 (2018)CrossRefGoogle Scholar
- 15.Liang, X., Lin, L., Wei, Y., Shen, X., Yang, J., Yan, S.: Proposal-free network for instance-level object segmentation. TPAMI 40(12), 2978–2991 (2017)CrossRefGoogle Scholar
- 16.Liang, X., et al.: Human parsing with contextualized convolutional neural network. In: ICCV (2015)Google Scholar
- 17.Lin, G., Milan, A., Shen, C., Reid, I.: RefineNet: multi-path refinement networks for high-resolution semantic segmentation. In: CVPR (2017)Google Scholar
- 18.Liu, T., et al.: Devil in the details: towards accurate single and multiple human parsing. In: AAAI (2019)Google Scholar
- 19.Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: CVPR (2015)Google Scholar
- 20.Mostajabi, M., Yadollahpour, P., Shakhnarovich, G.: Feedforward semantic segmentation with zoom-out features. In: CVPR (2015)Google Scholar
- 21.Park, S., Nie, B.X., Zhu, S.-C.: Attribute and-or grammar for joint parsing of human pose, parts and attributes. TPAMI 40(7), 1555–1569 (2017)CrossRefGoogle Scholar
- 22.Paszke, A., et al.: Automatic differentiation in PyTorch. In: NIPS (2017)Google Scholar
- 23.Wang, X., Girshick, R., Gupta, A., He, K.: Non-local neural networks. In: CVPR (2018)Google Scholar
- 24.Xia, F., Wang, P., Chen, X., Yuille, A.L.: Joint multi-person pose estimation and semantic part segmentation. In: CVPR (2017)Google Scholar
- 25.Xia, F., Zhu, J., Wang, P., Yuille, A.L.: Pose-guided human parsing by an and/or graph using pose-context features. In: AAAI (2016)Google Scholar
- 26.Yu, C., Wang, J., Peng, C., Gao, C., Yu, G., Sang, N.: BiSeNet: bilateral segmentation network for real-time semantic segmentation. In: ECCV (2018)Google Scholar
- 27.Yu, C., Wang, J., Peng, C., Gao, C., Yu, G., Sang, N.: Learning a discriminative feature network for semantic segmentation. In: CVPR (2018)Google Scholar
- 28.Zhao, H., Shi, J., Qi, X., Wang, X., Jia, J.: Pyramid scene parsing network. In: CVPR (2017)Google Scholar
- 29.Zhao, J., et al.: Self-supervised neural aggregation networks for human parsing. In: CVPR (2017)Google Scholar
- 30.Zhu, S., Urtasun, R., Fidler, S., Lin, D., Change Loy, C.: Be your own prada: fashion synthesis with structural coherence. In ICCV (2017)Google Scholar