MOON: A Mixed Objective Optimization Network for the Recognition of Facial Attributes

  • Ethan M. Rudd
  • Manuel Günther
  • Terrance E. Boult
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9909)

Abstract

Attribute recognition, particularly facial, extracts many labels for each image. While some multi-task vision problems can be decomposed into separate tasks and stages, e.g., training independent models for each task, for a growing set of problems joint optimization across all tasks has been shown to improve performance. We show that for deep convolutional neural network (DCNN) facial attribute extraction, multi-task optimization is better. Unfortunately, it can be difficult to apply joint optimization to DCNNs when training data is imbalanced, and re-balancing multi-label data directly is structurally infeasible, since adding/removing data to balance one label will change the sampling of the other labels. This paper addresses the multi-label imbalance problem by introducing a novel mixed objective optimization network (MOON) with a loss function that mixes multiple task objectives with domain adaptive re-weighting of propagated loss. Experiments demonstrate that not only does MOON advance the state of the art in facial attribute recognition, but it also outperforms independently trained DCNNs using the same data. When using facial attributes for the LFW face recognition task, we show that our balanced (domain adapted) network outperforms the unbalanced trained network.

Keywords

Facial attributes Deep neural networks Multi-task learning Multi-label learning Domain adaptation 

Notes

Acknowledgments

This research is based upon work supported in part by the Office of the Director of National Intelligence (ODNI), Intelligence Advanced Research Projects Activity (IARPA), via IARPA R&D Contract No. 2014-14071600012. The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the official policies or endorsements, either expressed or implied, of the ODNI, IARPA, or the U.S. Government. The U.S. Government is authorized to reproduce and distribute reprints for Governmental purposes notwithstanding any copyright annotation thereon.

Supplementary material

419978_1_En_2_MOESM1_ESM.pdf (5.8 mb)
Supplementary material 1 (pdf 5930 KB)

References

  1. 1.
    Zhou, Z.H., Zhang, M.L.: Multi-instance multi-label learning with application to scene classification. In: Advances in Neural Information Processing Systems, pp. 1609–1616 (2007)Google Scholar
  2. 2.
    Quattoni, A., Collins, M., Darrell, T.: Transfer learning for image classification with sparse prototype representations. In: Computer Vision and Pattern Recognition, pp. 1–8. IEEE (2008)Google Scholar
  3. 3.
    Huang, Y., Wang, W., Wang, L., Tan, T.: Multi-task deep neural network for multi-label learning. In: International Conference on Image Processing, pp. 2897–2900. IEEE (2013)Google Scholar
  4. 4.
    Wu, F., Wang, Z., Zhang, Z., Yang, Y., Luo, J., Zhu, W., Zhuang, Y.: Weakly semi-supervised deep learning for multi-label image annotation. Trans. Big Data 1(3), 109–122 (2015)CrossRefGoogle Scholar
  5. 5.
    Zhang, T., Ghanem, B., Liu, S., Ahuja, N.: Robust visual tracking via multi-task sparse learning. In: Computer Vision and Pattern Recognition, pp. 2042–2049. IEEE (2012)Google Scholar
  6. 6.
    Zhang, Z., Luo, P., Loy, C.C., Tang, X.: Facial landmark detection by deep multi-task learning. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014, Part VI. LNCS, vol. 8694, pp. 94–108. Springer, Heidelberg (2014). doi: 10.1007/978-3-319-10599-4_7 Google Scholar
  7. 7.
    Zhang, C., Zhang, Z.: Improving multiview face detection with multi-task deep convolutional neural networks. In: Winter Conference on Applications of Computer Vision, pp. 1036–1041. IEEE (2014)Google Scholar
  8. 8.
    Wang, X., Zhang, C., Zhang, Z.: Boosted multi-task learning for face verification with applications to web image and video search. In: Computer Vision and Pattern Recognition, pp. 142–149. IEEE (2009)Google Scholar
  9. 9.
    Yan, Y., Ricci, E., Subramanian, R., Lanz, O., Sebe, N.: No matter where you are: flexible graph-guided multi-task learning for multi-view head pose classification under target motion. In: International Conference on Computer Vision, pp. 1177–1184. IEEE (2013)Google Scholar
  10. 10.
    Ouyang, W., Chu, X., Wang, X.: Multi-source deep learning for human pose estimation. In: Computer Vision and Pattern Recognition, pp. 2329–2336. IEEE (2014)Google Scholar
  11. 11.
    Yim, J., Jung, H., Yoo, B., Choi, C., Park, D., Kim, J.: Rotating your face using multi-task deep neural network. In: Computer Vision and Pattern Recognition, pp. 676–684. IEEE (2015)Google Scholar
  12. 12.
    Kumar, N., Belhumeur, P., Nayar, S.: FaceTracer: a search engine for large collections of images with faces. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008, Part IV. LNCS, vol. 5305, pp. 340–353. Springer, Heidelberg (2008). doi: 10.1007/978-3-540-88693-8_25 CrossRefGoogle Scholar
  13. 13.
    Liu, Z., Luo, P., Wang, X., Tang, X.: Deep learning face attributes in the wild. In: International Conference on Computer Vision, pp. 3730–3738. IEEE (2015)Google Scholar
  14. 14.
    Patel, V.M., Gopalan, R., Li, R., Chellappa, R.: Visual domain adaptation: a survey of recent advances. Sig. Process. Mag. 32(3), 53–69 (2015)CrossRefGoogle Scholar
  15. 15.
    Wei, Y., Xia, W., Huang, J., Ni, B., Dong, J., Zhao, Y., Yan, S.: CNN: single-label to multi-label (2014). arXiv preprint: arXiv:1406.5726
  16. 16.
    Jaderberg, M., Simonyan, K., Vedaldi, A., Zisserman, A.: Deep structured output learning for unconstrained text recognition. In: International Conference on Learning Representations (2015)Google Scholar
  17. 17.
    Huang, Y., Wang, W., Wang, L.: Unconstrained multimodal multi-label learning. Trans. Multimedia 17(11), 1923–1935 (2015)CrossRefGoogle Scholar
  18. 18.
    Cootes, T.F., Edwards, G.J., Taylor, C.J.: Active appearance models. Trans. Pattern Anal. Mach. Intell. 23(6), 681–685 (2001)CrossRefGoogle Scholar
  19. 19.
    Blanz, V., Vetter, T.: Face recognition based on fitting a 3D morphable model. Trans. Pattern Anal. Mach. Intell. 25(9), 1063–1074 (2003)CrossRefGoogle Scholar
  20. 20.
    Kumar, N., Berg, A.C., Belhumeur, P.N., Nayar, S.K.: Describable visual attributes for face verification and image search. Trans. Pattern Anal. Mach. Intell. 33(10), 1962–1977 (2011)CrossRefGoogle Scholar
  21. 21.
    Scheirer, W.J., Kumar, N., Belhumeur, P.N., Boult, T.E.: Multi-attribute spaces: Calibration for attribute fusion and similarity search. In: Computer Vision and Pattern Recognition, pp. 2933–2940. IEEE (2012)Google Scholar
  22. 22.
    Kumar, N., Berg, A.C., Belhumeur, P.N., Nayar, S.K.: Attribute and simile classifiers for face verification. In: International Conference on Computer Vision, pp. 365–372. IEEE (2009)Google Scholar
  23. 23.
    Parikh, D., Grauman, K.: Interactively building a discriminative vocabulary of nameable attributes. In: Computer Vision and Pattern Recognition, pp. 1681–1688. IEEE (2011)Google Scholar
  24. 24.
    Zhang, Z., Luo, P., Loy, C.C., Tang, X.: Learning social relation traits from face images. In: International Conference on Computer Vision, pp. 3631–3639. IEEE (2015)Google Scholar
  25. 25.
    Wilber, M.J., Rudd, E., Heflin, B., Lui, Y.M., Boult, T.E.: Exemplar codes for facial attributes and tattoo recognition. In: Winter Conference on Applications of Computer Vision, pp. 205–212. IEEE (2014)Google Scholar
  26. 26.
    Kang, S., Lee, D., Yoo, C.D.: Face attribute classification using attribute-aware correlation map and gated convolutional neural networks. In: International Conference on Image Processing, pp. 4922–4926. IEEE (2015)Google Scholar
  27. 27.
    Escorcia, V., Niebles, J.C., Ghanem, B.: On the relationship between visual attributes and convolutional networks. In: Computer Vision and Pattern Recognition, pp. 1256–1264. IEEE (2015)Google Scholar
  28. 28.
    Zhong, Y., Sullivan, J., Li, H.: Leveraging mid-level deep representations for prediction face attributes in the wild. In: International Conference on Image Processing. IEEE (2016)Google Scholar
  29. 29.
    Ehrlich, M., Shields, T.J., Almaev, T., Amer, M.R.: Facial attributes classification using multi-task representation learning. In: Computer Vision and Pattern Recognition Workshops, pp. 47–55 (2016)Google Scholar
  30. 30.
    Tzeng, E., Hoffman, J., Darrell, T., Saenko, K.: Simultaneous deep transfer across domains and tasks. In: International Conference on Computer Vision, pp. 4068–4076. IEEE (2015)Google Scholar
  31. 31.
    Wang, J., Cheng, Y., Feris, R.S.: Walk and learn: Facial attribute representation learning from egocentric video and contextual data. In: Computer Vision and Pattern Recognition. IEEE (2016)Google Scholar
  32. 32.
    Jayaraman, D., Sha, F., Grauman, K.: Decorrelating semantic visual attributes by resisting the urge to share. In: Computer Vision and Pattern Recognition, pp. 1629–1636. IEEE (2014)Google Scholar
  33. 33.
    Parkhi, O.M., Vedaldi, A., Zisserman, A.: Deep face recognition. In: British Machine Vision Conference, vol. 1(3), p. 6 (2015)Google Scholar
  34. 34.
    Dutta, A., Günther, M., El Shafey, L., Marcel, S., Veldhuis, R., Spreeuwers, L.: Impact of eye detection error on face recognition performance. IET Biometrics 4, 137–150 (2015)CrossRefGoogle Scholar
  35. 35.
    Simard, P.Y., Steinkraus, D., Platt, J.C.: Best practices for convolutional neural networks applied to visual document analysis. In: International Conference on Document Analysis and Recognition, pp. 958–963. IEEE (2003)Google Scholar

Copyright information

© Springer International Publishing AG 2016

Authors and Affiliations

  • Ethan M. Rudd
    • 1
  • Manuel Günther
    • 1
  • Terrance E. Boult
    • 1
  1. 1.Vision and Security Technology (VAST) LabUniversity of Colorado at Colorado SpringsColorado SpringsUSA

Personalised recommendations