Face parsing is an important computer vision task that requires accurate pixel segmentation of facial parts (such as eyes, nose, mouth, etc.), providing a basis for further face analysis, modification, and other applications. Interlinked Convolutional Neural Networks (iCNN) was proved to be an effective two-stage model for face parsing. However, the original iCNN was trained separately in two stages, limiting its performance. To solve this problem, we introduce a simple, end-to-end face parsing framework: STN-aided iCNN(STN-iCNN), which extends the iCNN by adding a Spatial Transformer Network (STN) between the two isolated stages. The STN-iCNN uses the STN to provide a trainable connection to the original two-stage iCNN pipeline, making end-to-end joint training possible. Moreover, as a by-product, STN also provides more precise cropped parts than the original cropper. Due to these two advantages, our approach significantly improves the accuracy of the original model. Our model achieved competitive performance on the Helen Dataset, the standard face parsing dataset. It also achieved superior performance on CelebAMask-HQ dataset, proving its good generalization. Our code has been released at https://github.com/aod321/STN-iCNN.
This is a preview of subscription content, access via your institution.
Buy single article
Instant access to the full article PDF.
Tax calculation will be finalised during checkout.
Subscribe to journal
Immediate online access to all issues from 2019. Subscription will auto renew annually.
Tax calculation will be finalised during checkout.
Abbasi AA, Hussain L, Awan IA, Abbasi I, Majid A, Nadeem MSA, Chaudhary QA (2020) Detecting prostate cancer using deep learning convolution neural network with transfer learning approach. Cognit Neurodyn 14:1–11
Badrinarayanan V, Kendall A, Cipolla R (2017) Segnet: a deep convolutional encoder-decoder architecture for image segmentation. IEEE Trans Pattern Anal Mach Intell 39(12):2481–2495
Chen LC, Papandreou G, Kokkinos I, Murphy K, Yuille AL (2017a) Deeplab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE Trans Pattern Anal Mach Intell 40(4):834–848
Chen LC, Papandreou G, Schroff F, Adam H (2017b) Rethinking atrous convolution for semantic image segmentation. Preprint arXiv:170605587
Chen LC, Zhu Y, Papandreou G, Schroff F, Adam H (2018) Encoder–decoder with atrous separable convolution for semantic image segmentation. In: Proceedings of the European conference on computer vision (ECCV), pp 801–818
Fischer P, Dosovitskiy A, Brox T (2014) Descriptor matching with convolutional neural networks: a comparison to sift. Preprint arXiv:14055769
Girshick R, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 580–587
He K, Gkioxari G, Dollár P, Girshick R (2017) Mask r-cnn. In: Proceedings of the IEEE international conference on computer vision, pp 2961–2969
Hubel DH, Wiesel TN (1962) Receptive fields, binocular interaction and functional architecture in the cat’s visual cortex. J Physiol 160(1):106–154
Jackson AS, Valstar M, Tzimiropoulos G (2016) A cnn cascade for landmark guided semantic part segmentation. In: European conference on computer vision, Springer, pp 143–155
Jaderberg M, Simonyan K, Zisserman A, et al. (2015) Spatial transformer networks. In: Advances in neural information processing systems, pp 2017–2025
Jin X, Tan X (2017) Face alignment in-the-wild: a survey. Comput Vis Image Underst 162:1–22
Kingma DP, Ba J (2014) Adam: a method for stochastic optimization. arXiv: Learning
Krizhevsky A, Sutskever I, Hinton GE (2017) Imagenet classification with deep convolutional neural networks. Commun ACM 60(6):84–90. https://doi.org/10.1145/3065386
Lee CH, Liu Z, Wu L, Luo P (2019) Maskgan: towards diverse and interactive facial image manipulation. Technical Report
Lin J, Yang H, Chen D, Zeng M, Wen F, Yuan L (2019) Face parsing with roi tanh-warping. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5654–5663
Liu C, Chen LC, Schroff F, Adam H, Hua W, Yuille AL, Fei-Fei L (2019) Auto-deeplab: Hierarchical neural architecture search for semantic image segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 82–92
Liu S, Yang J, Huang C, Yang MH (2015) Multi-objective convolutional learning for face labeling. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3451–3459
Liu S, Shi J, Liang J, Yang M (2017) Face parsing via recurrent propagation. In: British machine vision conference 2017, BMVC 2017, BMVA Press, British machine vision conference 2017, BMVC 2017
Long J, Shelhamer E, Darrell T (2015) Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3431–3440
Long JL, Zhang N, Darrell T (2014) Do convnets learn correspondence? In: Advances in neural information processing systems, pp 1601–1609
Luo P, Wang X, Tang X (2012) Hierarchical face parsing via deep learning. In: 2012 IEEE conference on computer vision and pattern recognition, IEEE, pp 2480–2487
Oyedotun OK, Khashman A (2017) Banknote recognition: investigating processing and cognition framework using competitive neural network. Cognit Neurodyn 11(1):67–79
Ren S, He K, Girshick R, Sun J (2015) Faster r-cnn: Towards real-time object detection with region proposal networks. In: Advances in neural information processing systems, pp 91–99
Ronneberger O, Fischer P, Brox T (2015) U-net: convolutional networks for biomedical image segmentation. In: International conference on medical image computing and computer-assisted intervention, Springer, pp 234–241
Simonyan K, Zisserman A (2015) Very deep convolutional networks for large-scale image recognition. In: International conference on learning representations
Smith BM, Zhang L, Brandt J, Lin Z, Yang J (2013) Exemplar-based face parsing. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3484–3491
Tang C, Sheng L, Zhang Z, Hu X (2019) Improving pedestrian attribute recognition with weakly-supervised multi-scale attribute-specific localization. In: Proceedings of the IEEE international conference on computer vision, pp 4997–5006
Warrell J, Prince SJ (2009) Labelfaces: parsing facial features by multiclass labeling with an epitome prior. In: 2009 16th IEEE international conference on image processing (ICIP), IEEE, pp 2481–2484
Wei Z, Sun Y, Wang J, Lai H, Liu S (2017) Learning adaptive receptive fields for deep image parsing network. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2434–2442
Yamashita T, Nakamura T, Fukui H, Yamauchi Y, Fujiyoshi H (2015) Cost-alleviative learning for deep convolutional neural network-based facial part labeling. IPSJ Trans Comput Vis Appl 7:99–103
Zeng H, Yang C, Dai G, Qin F, Zhang J, Kong W (2018) EEG classification of driver mental states by deep learning. Cognit Neurodyn 12(6):597–606
Zhang N, Donahue J, Girshick R, Darrell T (2014) Part-based r-cnns for fine-grained category detection. In: European conference on computer vision, Springer, pp 834–849
Zhou L, Liu Z, He X (2017) Face parsing via a fully-convolutional continuous crf neural network. Preprint arXiv:170803736
Zhou Y, Hu X, Zhang B (2015) Interlinked convolutional neural networks for face parsing. In: International symposium on neural networks, Springer, pp 222–231
Zhu W, Xiang X, Tran TD, Hager GD, Xie X (2018) Adversarial deep structured nets for mass segmentation from mammograms. In: 2018 IEEE 15th international symposium on biomedical imaging (ISBI 2018), pp 847–850. https://doi.org/10.1109/ISBI.2018.8363704
Ziwei Liu XW, Ping L, Tang X (2015) Deep learning face attributes in the wild. In: Proceedings of international conference on computer vision (ICCV)
This work was supported by the National Natural Science Foundation of China under Grant Nos. U19B2034, 5197505 7, 601836014. The first author would like to thank Haoyu Liang, Aminul Huq for providing useful suggestions.
Conflict of interest
The authors declare that they have no conflict of interest.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
About this article
Cite this article
Yin, Z., Yiu, V., Hu, X. et al. End-to-end face parsing via interlinked convolutional neural networks. Cogn Neurodyn 15, 169–179 (2021). https://doi.org/10.1007/s11571-020-09615-4
- Face parsing