End-to-end face parsing via interlinked convolutional neural networks

Abstract

Face parsing is an important computer vision task that requires accurate pixel segmentation of facial parts (such as eyes, nose, mouth, etc.), providing a basis for further face analysis, modification, and other applications. Interlinked Convolutional Neural Networks (iCNN) was proved to be an effective two-stage model for face parsing. However, the original iCNN was trained separately in two stages, limiting its performance. To solve this problem, we introduce a simple, end-to-end face parsing framework: STN-aided iCNN(STN-iCNN), which extends the iCNN by adding a Spatial Transformer Network (STN) between the two isolated stages. The STN-iCNN uses the STN to provide a trainable connection to the original two-stage iCNN pipeline, making end-to-end joint training possible. Moreover, as a by-product, STN also provides more precise cropped parts than the original cropper. Due to these two advantages, our approach significantly improves the accuracy of the original model. Our model achieved competitive performance on the Helen Dataset, the standard face parsing dataset. It also achieved superior performance on CelebAMask-HQ dataset, proving its good generalization. Our code has been released at https://github.com/aod321/STN-iCNN.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

References

  1. Abbasi AA, Hussain L, Awan IA, Abbasi I, Majid A, Nadeem MSA, Chaudhary QA (2020) Detecting prostate cancer using deep learning convolution neural network with transfer learning approach. Cognit Neurodyn 14:1–11

    Article  Google Scholar 

  2. Badrinarayanan V, Kendall A, Cipolla R (2017) Segnet: a deep convolutional encoder-decoder architecture for image segmentation. IEEE Trans Pattern Anal Mach Intell 39(12):2481–2495

    Article  Google Scholar 

  3. Chen LC, Papandreou G, Kokkinos I, Murphy K, Yuille AL (2017a) Deeplab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE Trans Pattern Anal Mach Intell 40(4):834–848

    Article  Google Scholar 

  4. Chen LC, Papandreou G, Schroff F, Adam H (2017b) Rethinking atrous convolution for semantic image segmentation. Preprint arXiv:170605587

  5. Chen LC, Zhu Y, Papandreou G, Schroff F, Adam H (2018) Encoder–decoder with atrous separable convolution for semantic image segmentation. In: Proceedings of the European conference on computer vision (ECCV), pp 801–818

  6. Fischer P, Dosovitskiy A, Brox T (2014) Descriptor matching with convolutional neural networks: a comparison to sift. Preprint arXiv:14055769

  7. Girshick R, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 580–587

  8. He K, Gkioxari G, Dollár P, Girshick R (2017) Mask r-cnn. In: Proceedings of the IEEE international conference on computer vision, pp 2961–2969

  9. Hubel DH, Wiesel TN (1962) Receptive fields, binocular interaction and functional architecture in the cat’s visual cortex. J Physiol 160(1):106–154

    CAS  Article  Google Scholar 

  10. Jackson AS, Valstar M, Tzimiropoulos G (2016) A cnn cascade for landmark guided semantic part segmentation. In: European conference on computer vision, Springer, pp 143–155

  11. Jaderberg M, Simonyan K, Zisserman A, et al. (2015) Spatial transformer networks. In: Advances in neural information processing systems, pp 2017–2025

  12. Jin X, Tan X (2017) Face alignment in-the-wild: a survey. Comput Vis Image Underst 162:1–22

    Article  Google Scholar 

  13. Kingma DP, Ba J (2014) Adam: a method for stochastic optimization. arXiv: Learning

  14. Krizhevsky A, Sutskever I, Hinton GE (2017) Imagenet classification with deep convolutional neural networks. Commun ACM 60(6):84–90. https://doi.org/10.1145/3065386

    Article  Google Scholar 

  15. Lee CH, Liu Z, Wu L, Luo P (2019) Maskgan: towards diverse and interactive facial image manipulation. Technical Report

  16. Lin J, Yang H, Chen D, Zeng M, Wen F, Yuan L (2019) Face parsing with roi tanh-warping. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5654–5663

  17. Liu C, Chen LC, Schroff F, Adam H, Hua W, Yuille AL, Fei-Fei L (2019) Auto-deeplab: Hierarchical neural architecture search for semantic image segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 82–92

  18. Liu S, Yang J, Huang C, Yang MH (2015) Multi-objective convolutional learning for face labeling. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3451–3459

  19. Liu S, Shi J, Liang J, Yang M (2017) Face parsing via recurrent propagation. In: British machine vision conference 2017, BMVC 2017, BMVA Press, British machine vision conference 2017, BMVC 2017

  20. Long J, Shelhamer E, Darrell T (2015) Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3431–3440

  21. Long JL, Zhang N, Darrell T (2014) Do convnets learn correspondence? In: Advances in neural information processing systems, pp 1601–1609

  22. Luo P, Wang X, Tang X (2012) Hierarchical face parsing via deep learning. In: 2012 IEEE conference on computer vision and pattern recognition, IEEE, pp 2480–2487

  23. Oyedotun OK, Khashman A (2017) Banknote recognition: investigating processing and cognition framework using competitive neural network. Cognit Neurodyn 11(1):67–79

    Article  Google Scholar 

  24. Ren S, He K, Girshick R, Sun J (2015) Faster r-cnn: Towards real-time object detection with region proposal networks. In: Advances in neural information processing systems, pp 91–99

  25. Ronneberger O, Fischer P, Brox T (2015) U-net: convolutional networks for biomedical image segmentation. In: International conference on medical image computing and computer-assisted intervention, Springer, pp 234–241

  26. Simonyan K, Zisserman A (2015) Very deep convolutional networks for large-scale image recognition. In: International conference on learning representations

  27. Smith BM, Zhang L, Brandt J, Lin Z, Yang J (2013) Exemplar-based face parsing. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3484–3491

  28. Tang C, Sheng L, Zhang Z, Hu X (2019) Improving pedestrian attribute recognition with weakly-supervised multi-scale attribute-specific localization. In: Proceedings of the IEEE international conference on computer vision, pp 4997–5006

  29. Warrell J, Prince SJ (2009) Labelfaces: parsing facial features by multiclass labeling with an epitome prior. In: 2009 16th IEEE international conference on image processing (ICIP), IEEE, pp 2481–2484

  30. Wei Z, Sun Y, Wang J, Lai H, Liu S (2017) Learning adaptive receptive fields for deep image parsing network. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2434–2442

  31. Yamashita T, Nakamura T, Fukui H, Yamauchi Y, Fujiyoshi H (2015) Cost-alleviative learning for deep convolutional neural network-based facial part labeling. IPSJ Trans Comput Vis Appl 7:99–103

    Article  Google Scholar 

  32. Zeng H, Yang C, Dai G, Qin F, Zhang J, Kong W (2018) EEG classification of driver mental states by deep learning. Cognit Neurodyn 12(6):597–606

    Article  Google Scholar 

  33. Zhang N, Donahue J, Girshick R, Darrell T (2014) Part-based r-cnns for fine-grained category detection. In: European conference on computer vision, Springer, pp 834–849

  34. Zhou L, Liu Z, He X (2017) Face parsing via a fully-convolutional continuous crf neural network. Preprint arXiv:170803736

  35. Zhou Y, Hu X, Zhang B (2015) Interlinked convolutional neural networks for face parsing. In: International symposium on neural networks, Springer, pp 222–231

  36. Zhu W, Xiang X, Tran TD, Hager GD, Xie X (2018) Adversarial deep structured nets for mass segmentation from mammograms. In: 2018 IEEE 15th international symposium on biomedical imaging (ISBI 2018), pp 847–850. https://doi.org/10.1109/ISBI.2018.8363704

  37. Ziwei Liu XW, Ping L, Tang X (2015) Deep learning face attributes in the wild. In: Proceedings of international conference on computer vision (ICCV)

Download references

Acknowledgements

This work was supported by the National Natural Science Foundation of China under Grant Nos. U19B2034, 5197505 7, 601836014. The first author would like to thank Haoyu Liang, Aminul Huq for providing useful suggestions.

Author information

Affiliations

Authors

Corresponding author

Correspondence to Liang Tang.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Yin, Z., Yiu, V., Hu, X. et al. End-to-end face parsing via interlinked convolutional neural networks. Cogn Neurodyn 15, 169–179 (2021). https://doi.org/10.1007/s11571-020-09615-4

Download citation

Keywords

  • STN-iCNN
  • Face parsing
  • End-to-end