Advertisement

Unsupervised Domain Attention Adaptation Network for Caricature Attribute Recognition

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 12353)

Abstract

Caricature attributes provide distinctive facial features to help research in Psychology and Neuroscience. However, unlike the facial photo attribute datasets that have a quantity of annotated images, the annotations of caricature attributes are rare. To facility the research in attribute learning of caricatures, we propose a caricature attribute dataset, namely WebCariA. Moreover, to utilize models that trained by face attributes, we propose a novel unsupervised domain adaptation framework for cross-modality (i.e., photos to caricatures) attribute recognition, with an integrated inter- and intra-domain consistency learning scheme. Specifically, the inter-domain consistency learning scheme consisting an image-to-image translator to first fill the domain gap between photos and caricatures by generating intermediate image samples, and a label consistency learning module to align their semantic information. The intra-domain consistency learning scheme integrates the common feature consistency learning module with a novel attribute-aware attention-consistency learning module for a more efficient alignment. We did an extensive ablation study to show the effectiveness of the proposed method. And the proposed method also outperforms the state-of-the-art methods by a margin. The implementation of the proposed method is available at https://github.com/KeleiHe/DAAN.

Keywords

Unsupervised domain adaptation Caricature Attribute recognition Attention 

Notes

Acknowledgement

This work is supported in part by National Science Foundation of China under Grant No. 61806092, and in part by Jiangsu Natural Science Foundation under Grant No. BK20180326.

Supplementary material

504445_1_En_2_MOESM1_ESM.pdf (863 kb)
Supplementary material 1 (pdf 862 KB)

References

  1. 1.
    Abaci, B., Akgul, T.: Matching caricatures to photographs. Signal Image Video Process. 9(1), 295–303 (2015).  https://doi.org/10.1007/s11760-015-0819-8CrossRefGoogle Scholar
  2. 2.
    Abdulnabi, A.H., Wang, G., Lu, J., Jia, K.: Multi-task CNN model for attribute prediction. IEEE Trans. Multimed. 17(11), 1949–1959 (2015)CrossRefGoogle Scholar
  3. 3.
    Brennan, S.E.: Caricature generator: the dynamic exaggeration of faces by computer. Leonardo 40(4), 392–400 (2007)CrossRefGoogle Scholar
  4. 4.
    Cao, K., Liao, J., Yuan, L.: Carigans: unpaired photo-to-caricature translation. ACM Trans. Graph. 37(6), 244 (2018)Google Scholar
  5. 5.
    Chen, L.C., Papandreou, G., Kokkinos, I., Murphy, K., Yuille, A.L.: Deeplab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. IEEE Trans. Pattern Anal. Mach. Intell. 40(4), 834–848 (2017)CrossRefGoogle Scholar
  6. 6.
    Csurka, G.: Domain adaptation for visual applications: a comprehensive survey. arXiv preprint arXiv:1702.05374 (2017)
  7. 7.
    Ding, H., Zhou, H., Zhou, S.K., Chellappa, R.: A deep cascade network for unaligned face attribute classification. In: Thirty-Second AAAI Conference on Artificial Intelligence (2018)Google Scholar
  8. 8.
    Ehrlich, M., Shields, T.J., Almaev, T., Amer, M.R.: Facial attributes classification using multi-task representation learning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 47–55 (2016)Google Scholar
  9. 9.
    Ganin, Y., Lempitsky, V.: Unsupervised domain adaptation by backpropagation. In: International Conference on Machine Learning, pp. 1180–1189 (2015)Google Scholar
  10. 10.
    Geng, X., Yin, C., Zhou, Z.H.: Facial age estimation by learning from label distributions. IEEE Trans. Pattern Anal. Mach. Intell. 35(10), 2401–2412 (2013)CrossRefGoogle Scholar
  11. 11.
    He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)Google Scholar
  12. 12.
    He, K., Wang, Z., Fu, Y., Feng, R., Jiang, Y.G., Xue, X.: Adaptively weighted multi-task deep network for person attribute classification. In: Proceedings of the 25th ACM International Conference on Multimedia, pp. 1636–1644. ACM (2017)Google Scholar
  13. 13.
    Hoffman, J., et al.: Cycada: cycle-consistent adversarial domain adaptation. In: Proceedings of the 35th International Conference on Machine Learning (2018)Google Scholar
  14. 14.
    Huo, J., Li, W., Shi, Y., Gao, Y., Yin, H.: Webcaricature: a benchmark for caricature recognition. arXiv preprint arXiv:1703.03230 (2017)
  15. 15.
    Isola, P., Zhu, J.Y., Zhou, T., Efros, A.A.: Image-to-image translation with conditional adversarial networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1125–1134 (2017)Google Scholar
  16. 16.
    Jacob, L., Philippe Vert, J., Bach, F.R.: Clustered multi-task learning: a convex formulation. In: Koller, D., Schuurmans, D., Bengio, Y., Bottou, L. (eds.) Advances in Neural Information Processing Systems, vol. 21, pp. 745–752. Curran Associates, Inc. (2009). http://papers.nips.cc/paper/3499-clustered-multi-task-learning-a-convex-formulation.pdf
  17. 17.
    Kim, J., Kim, M., Kang, H., Lee, K.H.: U-gat-it: unsupervised generative attentional networks with adaptive layer-instance normalization for image-to-image translation. In: International Conference on Learning Representations (2019)Google Scholar
  18. 18.
    Klare, B.F., Bucak, S.S., Jain, A.K., Akgul, T.: Towards automated caricature recognition. In: 2012 5th IAPR International Conference on Biometrics (ICB), pp. 139–146. IEEE (2012)Google Scholar
  19. 19.
    Kumar, A., Daume III, H.: Learning task grouping and overlap in multi-task learning. In: ICML (2012)Google Scholar
  20. 20.
    Lee, S., Kim, D., Kim, N., Jeong, S.G.: Drop to adapt: learning discriminative features for unsupervised domain adaptation. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 91–100 (2019)Google Scholar
  21. 21.
    Liu, Z., Luo, P., Wang, X., Tang, X.: Deep learning face attributes in the wild. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 3730–3738 (2015)Google Scholar
  22. 22.
    Long, M., Cao, Y., Wang, J., Jordan, M.: Learning transferable features with deep adaptation networks. In: International Conference on Machine Learning, pp. 97–105 (2015)Google Scholar
  23. 23.
    Lu, Y., Kumar, A., Zhai, S., Cheng, Y., Javidi, T., Feris, R.: Fully-adaptive feature sharing in multi-task networks with applications in person attribute classification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5334–5343 (2017)Google Scholar
  24. 24.
    Luo, P., Wang, X., Tang, X.: Hierarchical face parsing via deep learning. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition, pp. 2480–2487. IEEE (2012)Google Scholar
  25. 25.
    Luo, P., Wang, X., Tang, X.: A deep sum-product architecture for robust facial attributes analysis. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2864–2871 (2013)Google Scholar
  26. 26.
    Mauro, R., Kubovy, M.: Caricature and face recognition. Mem. Cogn. 20(4), 433–440 (1992)CrossRefGoogle Scholar
  27. 27.
    Perkins, D.: A definition of caricature and caricature and recognition. Stud. Vis. Commun. 2(1), 1–24 (1975)Google Scholar
  28. 28.
    Rudd, E.M., Günther, M., Boult, T.E.: MOON: a mixed objective optimization network for the recognition of facial attributes. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9909, pp. 19–35. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-46454-1_2CrossRefGoogle Scholar
  29. 29.
    Russo, P., Carlucci, F.M., Tommasi, T., Caputo, B.: From source to target and back: symmetric bi-directional adaptive GAN. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2018Google Scholar
  30. 30.
    Saito, K., Watanabe, K., Ushiku, Y., Harada, T.: Maximum classifier discrepancy for unsupervised domain adaptation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3723–3732 (2018)Google Scholar
  31. 31.
    Smith, V., Chiang, C.K., Sanjabi, M., Talwalkar, A.S.: Federated multi-task learning. In: Advances in Neural Information Processing Systems, pp. 4424–4434 (2017)Google Scholar
  32. 32.
    Tsai, Y.H., Hung, W.C., Schulter, S., Sohn, K., Yang, M.H., Chandraker, M.: Learning to adapt structured output space for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7472–7481 (2018)Google Scholar
  33. 33.
    Valentine, T., Lewis, M.B., Hills, P.J.: Face-space: a unifying concept in face recognition research. Quart. J. Exp. Psychol. 69(10), 1996–2019 (2016)CrossRefGoogle Scholar
  34. 34.
    Vaswani, A., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems, pp. 5998–6008 (2017)Google Scholar
  35. 35.
    Vázquez, D., López, A.M., Ponsa, D.: Unsupervised domain adaptation of virtual and real worlds for pedestrian detection. In: Proceedings of the 21st International Conference on Pattern Recognition (ICPR 2012), pp. 3492–3495. IEEE (2012)Google Scholar
  36. 36.
    Wang, X., Guo, R., Kambhamettu, C.: Deeply-learned feature for age estimation. In: 2015 IEEE Winter Conference on Applications of Computer Vision, pp. 534–541. IEEE (2015)Google Scholar
  37. 37.
    Wang, Z., He, K., Fu, Y., Feng, R., Jiang, Y.G., Xue, X.: Multi-task deep neural network for joint face recognition and facial attribute prediction. In: Proceedings of the 2017 ACM on International Conference on Multimedia Retrieval, pp. 365–374. ACM (2017)Google Scholar
  38. 38.
    Zhang, Y., Shen, W., Sun, L., Li, Q.: Position-squeeze and excitation module for facial attribute analysis. In: BMVC (2018)Google Scholar
  39. 39.
    Zhou, B., Khosla, A., Lapedriza, A., Oliva, A., Torralba, A.: Learning deep features for discriminative localization. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2921–2929 (2016)Google Scholar
  40. 40.
    Zhu, J.Y., Park, T., Isola, P., Efros, A.A.: Unpaired image-to-image translation using cycle-consistent adversarial networks. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2223–2232 (2017)Google Scholar
  41. 41.
    Zhu, Z., Luo, P., Wang, X., Tang, X.: Multi-view perceptron: a deep model for learning face identity and view representations. In: Advances in Neural Information Processing Systems, pp. 217–225 (2014)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  1. 1.State Key Laboratory for Novel Software TechnologyNanjingChina
  2. 2.Medical School of Nanjing UniversityNanjingChina
  3. 3.National Institute of Healthcare Data Science at Nanjing UniversityNanjingChina

Personalised recommendations