Facial Landmark Detection by Deep Multi-task Learning

  • Zhanpeng Zhang
  • Ping Luo
  • Chen Change Loy
  • Xiaoou Tang
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8694)


Facial landmark detection has long been impeded by the problems of occlusion and pose variation. Instead of treating the detection task as a single and independent problem, we investigate the possibility of improving detection robustness through multi-task learning. Specifically, we wish to optimize facial landmark detection together with heterogeneous but subtly correlated tasks, e.g. head pose estimation and facial attribute inference. This is non-trivial since different tasks have different learning difficulties and convergence rates. To address this problem, we formulate a novel tasks-constrained deep model, with task-wise early stopping to facilitate learning convergence. Extensive evaluations show that the proposed task-constrained learning (i) outperforms existing methods, especially in dealing with faces with severe occlusion and pose variation, and (ii) reduces model complexity drastically compared to the state-of-the-art method based on cascaded deep model [21].


Face Image Convolutional Neural Network Related Task Deep Neural Network Facial Landmark 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Asthana, A., Zafeiriou, S., Cheng, S., Pantic, M.: Robust discriminative response map fitting with constrained local models. In: CVPR, pp. 3444–3451 (2013)Google Scholar
  2. 2.
    Belhumeur, P.N., Jacobs, D.W., Kriegman, D.J., Kumar, N.: Localizing parts of faces using a consensus of exemplars. In: CVPR, pp. 545–552 (2011)Google Scholar
  3. 3.
    Burgos-Artizzu, X.P., Perona, P., Dollar, P.: Robust face landmark estimation under occlusion. In: ICCV, pp. 1513–1520 (2013)Google Scholar
  4. 4.
    Cao, X., Wei, Y., Wen, F., Sun, J.: Face alignment by explicit shape regression. In: CVPR, pp. 2887–2894 (2012)Google Scholar
  5. 5.
    Caruana, R.: Multitask learning. Machine Learning 28(1), 41–75 (1997)CrossRefMathSciNetGoogle Scholar
  6. 6.
    Chen, K., Gong, S., Xiang, T., Loy, C.C.: Cumulative attribute space for age and crowd density estimation. In: CVPR, pp. 2467–2474 (2013)Google Scholar
  7. 7.
    Collobert, R., Weston, J.: A unified architecture for natural language processing: Deep neural networks with multitask learning. In: ICML, pp. 160–167 (2008)Google Scholar
  8. 8.
    Cootes, T.F., Edwards, G.J., Taylor, C.J.: Active appearance models. PAMI 23(6), 681–685 (2001)CrossRefGoogle Scholar
  9. 9.
    Cootes, T.F., Ionita, M.C., Lindner, C., Sauer, P.: Robust and accurate shape model fitting using random forest regression voting. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012, Part VII. LNCS, vol. 7578, pp. 278–291. Springer, Heidelberg (2012)CrossRefGoogle Scholar
  10. 10.
    Dantone, M., Gall, J., Fanelli, G., Van Gool, L.: Real-time facial feature detection using conditional regression forests. In: CVPR, pp. 2578–2585 (2012)Google Scholar
  11. 11.
    Kostinger, M., Wohlhart, P., Roth, P.M., Bischof, H.: Annotated facial landmarks in the wild: A large-scale, real-world database for facial landmark localization. In: ICCV Workshops, pp. 2144–2151 (2011)Google Scholar
  12. 12.
    Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: NIPS (2012)Google Scholar
  13. 13.
    Li, H., Shen, C., Shi, Q.: Real-time visual tracking using compressive sensing. In: CVPR, pp. 1305–1312 (2011)Google Scholar
  14. 14.
    Liu, X.: Generic face alignment using boosted appearance model. In: CVPR (2007)Google Scholar
  15. 15.
    Lu, C., Tang, X.: Surpassing human-level face verification performance on LFW with GaussianFace. Tech. rep., arXiv:1404.3840 (2014)Google Scholar
  16. 16.
    Luo, P., Wang, X., Tang, X.: Hierarchical face parsing via deep learning. In: CVPR, pp. 2480–2487 (2012)Google Scholar
  17. 17.
    Luo, P., Wang, X., Tang, X.: A deep sum-product architecture for robust facial attributes analysis. In: CVPR, pp. 2864–2871 (2013)Google Scholar
  18. 18.
    Luxand Incorporated: Luxand face SDK,
  19. 19.
    Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: ICML, pp. 807–814 (2010)Google Scholar
  20. 20.
    Prechelt, L.: Automatic early stopping using cross validation: quantifying the criteria. Neural Networks 11(4), 761–767 (1998)CrossRefGoogle Scholar
  21. 21.
    Sun, Y., Wang, X., Tang, X.: Deep convolutional network cascade for facial point detection. In: CVPR, pp. 3476–3483 (2013)Google Scholar
  22. 22.
    Sun, Y., Wang, X., Tang, X.: Deep learning face representation by joint identification-verification. Tech. rep., arXiv:1406.4773 (2014)Google Scholar
  23. 23.
    Sun, Y., Wang, X., Tang, X.: Deep learning face representation from predicting 10,000 classes. In: CVPR (2014)Google Scholar
  24. 24.
    Valstar, M., Martinez, B., Binefa, X., Pantic, M.: Facial point detection using boosted regression and graph models. In: CVPR, pp. 2729–2736 (2010)Google Scholar
  25. 25.
    Xiong, X., De La Torre, F.: Supervised descent method and its applications to face alignment. In: CVPR, pp. 532–539 (2013)Google Scholar
  26. 26.
    Yang, H., Patras, I.: Sieving regression forest votes for facial feature detection in the wild. In: ICCV, pp. 1936–1943 (2013)Google Scholar
  27. 27.
    Yu, X., Huang, J., Zhang, S., Yan, W., Metaxas, D.N.: Pose-free facial landmark fitting via optimized part mixtures and cascaded deformable shape model. In: ICCV, pp. 1944–1951 (2013)Google Scholar
  28. 28.
    Yuan, X.T., Liu, X., Yan, S.: Visual classification with multitask joint sparse representation. TIP 21(10), 4349–4360 (2012)MathSciNetGoogle Scholar
  29. 29.
    Zhang, T., Ghanem, B., Liu, S., Ahuja, N.: Robust visual tracking via structured multi-task sparse learning. IJCV 101(2), 367–383 (2013)CrossRefMathSciNetGoogle Scholar
  30. 30.
    Zhang, Y., Yeung, D.Y.: A convex formulation for learning task relationships in multi-task learning. In: UAI (2011)Google Scholar
  31. 31.
    Zhang, Z., Zhang, W., Liu, J., Tang, X.: Facial landmark localization based on hierarchical pose regression with cascaded random ferns. In: ACM Multimedia, pp. 561–564 (2013)Google Scholar
  32. 32.
    Zhu, X., Ramanan, D.: Face detection, pose estimation, and landmark localization in the wild. In: CVPR, pp. 2879–2886 (2012)Google Scholar
  33. 33.
    Zhu, Z., Luo, P., Wang, X., Tang, X.: Deep learning identity-preserving face space. In: ICCV, pp. 113–120 (2013)Google Scholar
  34. 34.
    Zhu, Z., Luo, P., Wang, X., Tang, X.: Deep learning multi-view representation for face recognition. Tech. rep., arXiv:1406.6947 (2014)Google Scholar
  35. 35.
    Zhu, Z., Luo, P., Wang, X., Tang, X.: Recover canonical-view faces in the wild with deep neural networks. Tech. rep., arXiv:1404.3543 (2014)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  • Zhanpeng Zhang
    • 1
  • Ping Luo
    • 1
  • Chen Change Loy
    • 1
  • Xiaoou Tang
    • 1
  1. 1.Dept. of Information EngineeringThe Chinese University of Hong KongHong KongChina

Personalised recommendations