Cross-modal photo-caricature face recognition based on dynamic multi-task learning

Ming, Zuheng; Burie, Jean-Christophe; Luqman, Muhammad Muzzamil

doi:10.1007/s10032-021-00364-6

Cross-modal photo-caricature face recognition based on dynamic multi-task learning

Special Issue Paper
Published: 16 March 2021

Volume 24, pages 33–48, (2021)
Cite this article

International Journal on Document Analysis and Recognition (IJDAR) Aims and scope Submit manuscript

Zuheng Ming ORCID: orcid.org/0000-0002-1094-3112¹,
Jean-Christophe Burie¹ &
Muhammad Muzzamil Luqman¹

370 Accesses
1 Citation
Explore all metrics

Abstract

Face recognition of realistic visual images (e.g., photos) has been well studied and made significant progress in the recent decade. However, face recognition between realistic visual images/photos and caricatures is still a challenging problem. Unlike the photos, the different artistic styles of caricatures introduce extreme non-rigid distortions of caricatures. The great representational gap between the different modalities of photos and caricatures is a big challenge for photo-caricature face recognition. In this paper, we propose to conduct cross-modal photo-caricature face recognition via multi-task learning, which can learn the features of different modalities with different tasks. Instead of manually setting the task weights as in conventional multi-task learning, this work proposes a dynamic weights learning module which can automatically generate/learn task weights according to the training importance of tasks. The learned task weights enable the network to focus on training the hard tasks instead of being stuck in the overtraining of easy tasks. The experimental results demonstrate the effectiveness of the proposed dynamic multi-task learning for cross-modal photo-caricature face recognition. The performance on the datasets CaVI and WebCaricature show the superiority over the state-of-art methods. The implementation code is provided here. (https://github.com/hengxyz/cari-visual-recognition-via-multitask-learning.git).

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Image Matching from Handcrafted to Deep Features: A Survey

Article Open access 04 August 2020

A survey on deep multimodal learning for computer vision: advances, trends, applications, and datasets

Article 10 June 2021

A Comprehensive Survey of Loss Functions in Machine Learning

Article 12 April 2020

References

Taigman, Yaniv, Yang, Ming, et al.: Deepface: Closing the gap to human-level performance in face verification. In: CVPR, pp. 1701–1708, (2014)
Parkhi, Omkar M., Vedaldi, Andrea, Zisserman, Andrew, et al.: Deep face recognition. In: BMVC, p. 6, (2015)
Schroff, Florian, Kalenichenko, Dmitry, Philbin, James: Facenet: A unified embedding for face recognition and clustering. In: CVPR, pp. 815–823, (2015)
Liu, Weiyang, Wen, Yandong, Yu, Zhiding, Li, Ming, Raj, Bhiksha, Song, Le.: Sphereface: Deep hypersphere embedding for face recognition. In: The CVPR, vol. 1, p. 1 (2017)
Huang, Gary B., Ramesh, Manu, Berg, Tamara, Learned-Miller, Erik: Labeled faces in the wild: A database for studying face recognition in unconstrained environments. Technical report, Technical Report 07-49, University of Massachusetts, Amherst (2007)
Wolf, Lior, Hassner, Tal, Maoz, Itay: Face recognition in unconstrained videos with matched background similarity. In: CVPR, 2011 IEEE Conference on, pp. 529–534. IEEE (2011)
Ahonen, Timo: Hadid, Abdenour, Pietikainen, Matti: face description with local binary patterns: application to face recognition. IEEE Transact. Pattern Anal. Mach. Intell. 28(12), 2037–2041 (2006)
Article Google Scholar
Tan, Xiaoyang, Triggs, Bill: Fusing gabor and lbp feature sets for kernel-based face recognition. In: International workshop on analysis and modeling of faces and gestures, pp. 235–249. Springer (2007)
Déniz, Oscar: Bueno, Gloria, Salido, Jesús, De la Torre, Fernando: Face recognition using histograms of oriented gradients. Pattern Recognit. Lett. 32(12), 1598–1603 (2011)
Article Google Scholar
Bicego, Manuele, Lagorio, Andrea, Grosso, Enrico, Tistarelli, Massimo: On the use of sift features for face authentication. In: Computer Vision and Pattern Recognition Workshop, 2006. CVPRW’06. Conference on, pp. 35–35. IEEE (2006)
Huo, Jing, Li, Wenbin, Shi, Yinghuan, Gao, Yang, Yin, Hujun: Webcaricature: a benchmark for caricature recognition. In: British Machine Vision Conference (2018)
Mittal, Paritosh, Vatsa, Mayank, Singh, Richa: Composite sketch recognition via deep network-a transfer learning approach. In: 2015 International Conference on Biometrics (ICB), pp. 251–256. IEEE (2015)
Galea, Christian, Farrugia, Reuben A.: Forensic face photo-sketch recognition using a deep learning-based architecture. IEEE Signal Process. Lett. 24(11), 1586–1590 (2017)
Article Google Scholar
Li, Shan, Deng, Weihong: Deep facial expression recognition: A survey. IEEE Transactions on Affective Computing (2020)
He, Ran, Wu, Xiang, Sun, Zhenan, Tan, Tieniu: Learning invariant deep representation for nir-vis face recognition. In: Thirty-First AAAI Conference on Artificial Intelligence (2017)
He, Ran, Xiang, Wu, Sun, Zhenan, Tan, Tieniu: Wasserstein cnn: learning invariant features for nir-vis face recognition. IEEE Trans. Pattern Anal. Mach. Intell. 41(7), 1761–1773 (2018)
Article Google Scholar
Kim, Donghyun, Hernandez, Matthias, Choi, Jongmoo, Medioni, Gérard: Deep 3d face identification. In: 2017 IEEE international joint conference on biometrics (IJCB), pp. 133–142. IEEE (2017)
Zulqarnain Gilani, Syed, Mian, Ajmal: Learning from millions of 3d scans for large-scale 3d face recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1896–1905 (2018)
Garg, Jatin, Peri, Skand Vishwanath, Tolani, Himanshu, Krishnan, Narayanan C.: Deep cross modal learning for caricature verification and identification (cavinet). arXiv preprint arXiv:1807.11688, (2018)
Cai, Deng, He, Xiaofei, Han, Jiawei: Speed up kernel discriminant analysis. VLDB J. 20(1), 21–33 (2011)
Article Google Scholar
van der Maaten, Laurens, Hinton, Geoffrey: Visualizing data using t-sne. J. Mach. Learn. Res. 9, 2579–2605 (2008)
MATH Google Scholar
Ruder, Sebastian: An overview of multi-task learning in deep neural networks. arXiv preprint arXiv:1706.05098, (2017)
Girshick, Ross: Fast r-cnn. In: Proceedings of the IEEE international conference on computer vision, pp. 1440–1448 (2015)
Ranjan, Rajeev, Patel, Vishal M., Chellappa, Rama: Hyperface: a deep multi-task learning framework for face detection, landmark localization, pose estimation, and gender recognition. IEEE Trans. Pattern Anal. Mach. Intell. 41, 121 (2017)
Article Google Scholar
Tian, Yonglong, Luo, Ping, Wang, Xiaogang, Tang, Xiaoou: Pedestrian detection aided by deep learning semantic tasks. In: Proceedings of the CVPR, pp. 5079–5087 (2015)
Chen, Zhao, Badrinarayanan, Vijay, Lee, Chen-Yu, Rabinovich, Andrew: Gradnorm: Gradient normalization for adaptive loss balancing in deep multitask networks. arXiv preprint arXiv:1711.02257, (2017)
Kendall, Alex, Gal, Yarin, Cipolla, Roberto: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7482–7491 (2018)
Yin, Xi, Liu, Xiaoming: Multi-task convolutional neural network for pose-invariant face recognition. IEEE Trans. Image Proces. 27(2), 964–975 (2008)
Article MathSciNet Google Scholar
Duong, Long, Cohn, Trevor, Bird, Steven, Cook, Paul: Low resource dependency parsing: Cross-lingual parameter sharing in a neural network parser. In: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers), pp. 845–850 (2015)
Misra, Ishan, Shrivastava, Abhinav, Gupta, Abhinav, Hebert, Martial: Cross-stitch networks for multi-task learning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3994–4003 (2016)
Bragman, Felix J.S., Tanno, Ryutaro, Ourselin, Sebastien, Alexander, Daniel C., Cardoso, Jorge: Stochastic filter groups for multi-task cnns: Learning specialist and generalist convolution kernels. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1385–1394 (2019)
Chen, Weihua, Chen, Xiaotang, Zhang, Jianguo, Huang, Kaiqi: (2017) A multi-task deep network for person re-identification. In: AAAI, pp. 3988–3994
Zhang, Zhanpeng: Luo, Ping, Loy, Chen Change, Tang, Xiaoou, : Learning deep representation for face alignment with auxiliary attributes. IEEE Trans. Pattern Anal. Mach. Intell. 38(5), 918–930 (2016)
Tran, Anh T., Nguyen, Cuong V., Hassner, Tal: Transferability and hardness of supervised classification tasks. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1395–1405 (2019)
Sun, Yi, Wang, Xiaogang, Tang, Xiaoou: Deeply learned face representations are sparse, selective, and robust. In: CVPR, pp. 2892–2900 (2015)
Simonyan, Karen, Zisserman, Andrew: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)
Wen, Yandong, Zhang, Kaipeng, Li, Zhifeng, Qiao, Yu: A discriminative feature learning approach for deep face recognition. In: European Conference on Computer Vision, pp. 499–515. Springer (2016)
Kemelmacher-Shlizerman, Ira, Seitz, Steven M., Miller, Daniel, Brossard, Evan: The megaface benchmark: 1 million faces for recognition at scale. In: Proceedings of the CVPR, pp. 4873–4882 (2016)
Zhang, Liliang, Lin, Liang, Wu, Xian, Ding, Shengyong, Zhang, Lei: End-to-end photo-sketch generation via fully convolutional representation learning. In: Proceedings of the 5th ACM on International Conference on Multimedia Retrieval, pp. 627–634 (2015)
Zhu, Jun-Yan, Park, Taesung, Isola, Phillip, Efros, Alexei A.: Unpaired image-to-image translation using cycle-consistent adversarial networks. In: Proceedings of the IEEE international conference on computer vision, pp. 2223–2232 (2017)
Wang, Lidan, Sindagi, Vishwanath, Patel, Vishal: High-quality facial photo-sketch synthesis using multi-adversarial networks. In: 2018 13th IEEE international conference on automatic face & gesture recognition (FG 2018), pp. 83–90. IEEE (2018)
Saxena, Shreyas, Verbeek, Jakob: Heterogeneous face recognition with cnns. In: European conference on computer vision, pp. 483–491. Springer (2016)
Liu, Xiaoxiang, Song, Lingxiao, Wu, Xiang, Tan, Tieniu: Transferring deep representation for nir-vis heterogeneous face recognition. In: 2016 International Conference on Biometrics (ICB), pp. 1–8. IEEE (2016)
Lezama, José, Qiu, Qiang, Sapiro, Guillermo: Not afraid of the dark: Nir-vis face recognition via cross-spectral hallucination and low-rank embedding. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 6628–6637 (2017)
Collobert, Ronan, Weston, Jason: A unified architecture for natural language processing: Deep neural networks with multitask learning. In: Proceedings of the 25th international conference on Machine learning, pp. 160–167. ACM, (2008)
Deng, Li, Hinton, Geoffrey, Kingsbury, Brian: New types of deep neural network learning for speech recognition and related applications: An overview. In: Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on, pp. 8599–8603. IEEE (2013)
Szegedy, Christian, Ioffe, Sergey, Vanhoucke, Vincent, Alemi, Alexander A.: Inception-v4, inception-resnet and the impact of residual connections on learning. In: Thirty-First AAAI Conference on Artificial Intelligence (2017)
Simonyan, Karen, Omkar, M., et al. Parkhi. Fisher vector faces in the wild. In: BMVC, p. 4 (2013)
Amos, Brandon, Ludwiczuk, Bartosz, Satyanarayanan, Mahadev, et al. Openface: A general-purpose face recognition library with mobile applications. CMU School of Computer Science, 6, (2016)
MegviiInc. Face++ research toolkit. www.faceplusplus.com,. (December 2013)
Guo, Yandong, Zhang, Lei, Hu, Yuxiao, He, Xiaodong, Gao, Jianfeng: Ms-celeb-1m: A dataset and benchmark for large-scale face recognition. In: European Conference on Computer Vision, pp. 87–102. Springer (2016)
Zhang, Kaipeng, Zhang, Zhanpeng, Li, Zhifeng, Qiao, Yu.: Joint face detection and alignment using multitask cascaded convolutional networks. Signal Proces. Lett. 23(10), 1499–1503 (2016)
Article Google Scholar
Glorot, Xavier, Bengio, Yoshua: Understanding the difficulty of training deep feedforward neural networks. In: 13th International Conference on Artificial Intelligence and Statistics, pp. 249–256 (2010)

Download references

Author information

Authors and Affiliations

Laboratory L3i, La Rochelle University, 17402, La Rochelle, France
Zuheng Ming, Jean-Christophe Burie & Muhammad Muzzamil Luqman

Authors

Zuheng Ming
View author publications
You can also search for this author in PubMed Google Scholar
Jean-Christophe Burie
View author publications
You can also search for this author in PubMed Google Scholar
Muhammad Muzzamil Luqman
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Zuheng Ming.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ming, Z., Burie, JC. & Luqman, M.M. Cross-modal photo-caricature face recognition based on dynamic multi-task learning. IJDAR 24, 33–48 (2021). https://doi.org/10.1007/s10032-021-00364-6

Download citation

Received: 17 February 2020
Revised: 25 December 2020
Accepted: 24 February 2021
Published: 16 March 2021
Issue Date: June 2021
DOI: https://doi.org/10.1007/s10032-021-00364-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Cross-modal photo-caricature face recognition based on dynamic multi-task learning

Abstract

Access this article

Similar content being viewed by others

Image Matching from Handcrafted to Deep Features: A Survey

A survey on deep multimodal learning for computer vision: advances, trends, applications, and datasets

A Comprehensive Survey of Loss Functions in Machine Learning

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Cross-modal photo-caricature face recognition based on dynamic multi-task learning

Abstract

Access this article

Similar content being viewed by others

Image Matching from Handcrafted to Deep Features: A Survey

A survey on deep multimodal learning for computer vision: advances, trends, applications, and datasets

A Comprehensive Survey of Loss Functions in Machine Learning

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation