International Journal of Computer Vision

, Volume 127, Issue 5, pp 437–455 | Cite as

Deep Appearance Models: A Deep Boltzmann Machine Approach for Face Modeling

  • Chi Nhan DuongEmail author
  • Khoa Luu
  • Kha Gia Quach
  • Tien D. Bui


The “interpretation through synthesis” approach to analyze face images, particularly Active Appearance Models (AAMs) method, has become one of the most successful face modeling approaches over the last two decades. AAM models have ability to represent face images through synthesis using a controllable parameterized Principal Component Analysis (PCA) model. However, the accuracy and robustness of the synthesized faces of AAMs are highly depended on the training sets and inherently on the generalizability of PCA subspaces. This paper presents a novel Deep Appearance Models (DAMs) approach, an efficient replacement for AAMs, to accurately capture both shape and texture of face images under large variations. In this approach, three crucial components represented in hierarchical layers are modeled using the Deep Boltzmann Machines (DBM) to robustly capture the variations of facial shapes and appearances. DAMs are therefore superior to AAMs in inferencing a representation for new face images under various challenging conditions. The proposed approach is evaluated in various applications to demonstrate its robustness and capabilities, i.e. facial super-resolution reconstruction, facial off-angle reconstruction or face frontalization, facial occlusion removal and age estimation using challenging face databases, i.e. Labeled Face Parts in the Wild, Helen and FG-NET. Comparing to AAMs and other deep learning based approaches, the proposed DAMs achieve competitive results in those applications, thus this showed their advantages in handling occlusions, facial representation, and reconstruction.


Deep Boltzmann machines Deep appearance models Active appearance models Principal component analysis Facial super-resolution reconstruction Facial off-angle reconstruction Face frontalization Age estimation 



This work was supported in part by the Natural Sciences and Engineering Research Council (NSERC) of Canada.


  1. Amberg, B., Blake, A., & Vetter, T. (2009). On compositional image alignment, with an application to active appearance models. In CVPR (pp. 1714–1721). IEEE.Google Scholar
  2. Anderson, R., Stenger, B., Wan, V., & Cipolla, R. (2013). Expressive visual text-to-speech using active appearance models. In CVPR (pp. 3382–3389). IEEE.Google Scholar
  3. Antonakos, E., Alabort-i Medina, J., Tzimiropoulos, G., & Zafeiriou, S. (2014). Hog active appearance models. In ICIP (pp. 224–228). IEEE.Google Scholar
  4. Antonakos, E., Alabort-i Medina, J., Tzimiropoulos, G., & Zafeiriou, S. P. (2015). Feature-based lucas–kanade and active appearance models. IEEE Transactions on Image Processing, 24(9), 2617–2632.MathSciNetCrossRefGoogle Scholar
  5. Antonakos, E., Snape, P., Trigeorgis, G., & Zafeiriou, S. (2016). Adaptive cascaded regression. In IEEE international conference on image processing (ICIP), 2016 (pp. 1649–1653). IEEE.Google Scholar
  6. Belhumeur, P. N., Jacobs, D. W., Kriegman, D., & Kumar, N. (2011). Localizing parts of faces using a consensus of exemplars. In CVPR (pp. 545–552). IEEE.Google Scholar
  7. Burgos-Artizzu, X. P., Perona, P., & Dollár, P. (2013). Robust face landmark estimation under occlusion. In ICCV (pp. 1513–1520). IEEE.Google Scholar
  8. Chen, K., Gong, S., Xiang, T., & Loy, C. (2013). Cumulative attribute space for age and crowd density estimation. In CVPR (pp. 2467–2474).Google Scholar
  9. Cootes, T. F., & Taylor, C. J. (2006). An algorithm for tuning an active appearance model to new data. In BMVC (pp. 919–928).Google Scholar
  10. Cootes, T. F., Edwards, G. J., & Taylor, C. J. (1998). Interprettting face images using active appearance models. In FG (pp. 300–305).Google Scholar
  11. Cootes, T. F., Edwards, G. J., & Taylor, C. J. (2001). Active appearance models. IEEE Transactions on Pattern Analysis and Machine Intelligence, 23(6), 681–685.CrossRefGoogle Scholar
  12. Ding, C., & Tao, D. (2015). Robust face recognition via multimodal deep face representation. IEEE Transactions on Multimedia, 17(11), 2049–2058.CrossRefGoogle Scholar
  13. Dong, C., Loy, C. C., He, K., & Tang, X. (2014). Learning a deep convolutional network for image super-resolution. In ECCV, (pp. 184–199). Berlin: Springer.Google Scholar
  14. Donner, R., Reiter, M., Langs, G., Peloschek, P., & Bischof, H. (2006). Fast active appearance model search using canonical correlation analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence, 28(10), 1690.CrossRefGoogle Scholar
  15. Duong, C. N., Quach, K. G., Luu, K., Le, H. B., & Ricanek, K. (2011). Fine tuning age-estimation with global and local facial features. In International conference on acoustics, speech and signal processing (ICASSP). IEEE.Google Scholar
  16. Duong, C. N., Luu, K., Gia Quach, K., & Bui, T. D. (2015). Beyond principal components: Deep boltzmann machines for face modeling. In: CVPR (pp. 4786–4794).Google Scholar
  17. Edwards, G. J., Cootes, T. F., & Taylor, C. J. (1998). Face recognition using active appearance models. In: ECCV (pp. 581–595). Berlin: Springer.Google Scholar
  18. Eslami, S. A., Heess, N., Williams, C. K., & Winn, J. (2014). The shape boltzmann machine: A strong model of object shape. International Journal of Computer Vision, 107(2), 155–176.MathSciNetCrossRefzbMATHGoogle Scholar
  19. Ferrari, C., Lisanti, G., Berretti, S., & Del Bimbo, A. (2016). Effective 3d based frontalization for unconstrained face recognition. In 23rd International conference on pattern recognition (ICPR) (pp. 1047–1052). IEEE.Google Scholar
  20. Fu, Y., & Huang, T. S. (2008). Human age estimation with regression on discriminative aging manifold. IEEE Transactions on Multimedia, 10(4), 578–584.CrossRefGoogle Scholar
  21. Gao, S., Zhang, Y., Jia, K., Lu, J., & Zhang, Y. (2015). Single sample face recognition via learning deep supervised autoencoders. IEEE Transactions on Information Forensics and Security, 10(10), 2108–2118.CrossRefGoogle Scholar
  22. Ge, Y., Yang, D., Lu, J., Li, B., & Zhang, X. (2013). Active appearance models using statistical characteristics of gabor based texture representation. Journal of Visual Communication and Image Representation, 24(5), 627–634.CrossRefGoogle Scholar
  23. Gross, R., Matthews, I., & Baker, S. (2005). Generic vs. person specific active appearance models. Image and Vision Computing, 23(12), 1080–1093.CrossRefGoogle Scholar
  24. Haase, D., Rodner, E., & Denzler, J. (2014). Instance-weighted transfer learning of active appearance models. In CVPR (pp. 1426–1433). IEEE.Google Scholar
  25. Hassner, T., Harel, S., Paz, E., & Enbar, R. (2015). Effective face frontalization in unconstrained images. In CVPR (pp. 4295 – 4304).Google Scholar
  26. Hou, X., Li, SZ., Zhang, H., & Cheng, Q. (2001). Direct appearance models. In: CVPR (Vol. 1, pp. I–828–I–833). IEEE.Google Scholar
  27. Huang, GB., Lee, H., & Learned-Miller, E. (2012). Learning hierarchical representations for face verification with convolutional deep belief networks. In CVPR (pp. 2518–2525). IEEE.Google Scholar
  28. Huiskes, M. J., Thomee, B., & Lew, M. S. (2010). New trends and ideas in visual concept detection: The mir flickr retrieval evaluation initiative. In ICMR (pp. 527–536). ACM.Google Scholar
  29. Jeni, L. A., Cohn, J. F. (2016). Person-independent 3d gaze estimation using face frontalization. In Proceedings of the IEEE conference on computer vision and pattern recognition workshops (pp. 87–95).Google Scholar
  30. Kan, M., Shan, S., Chang, H., & Chen, X. (2014). Stacked progressive auto-encoders (spae) for face recognition across poses. In CVPR (pp. 1883–1890).Google Scholar
  31. Le, V., Brandt, J., Lin, Z., Bourdev, L., & Huang, T. S. (2012). Interactive facial feature localization. In ECCV (pp. 679–692). Berlin: Springer.Google Scholar
  32. Li, C., Liu, Q., Liu, J., & Lu, H. (2012). Learning ordinal discriminative features for age estimation. In CVPR (pp. 2570–2577). IEEE.Google Scholar
  33. Li, C., Zhou, K., & Lin, S. (2014). Intrinsic face image decomposition with human face priors. In ECCV (pp. 218–233). Springer.Google Scholar
  34. Liu, L., Xiong, C., Zhang, H., Niu, Z., Wang, M., & Yan, S. (2016). Deep aging face verification with large gaps. IEEE Transactions on Multimedia, 18(1), 64–75.CrossRefGoogle Scholar
  35. Luu, K., Ricanek, K., Bui, T. D., & Suen, C. Y. (2009). Age estimation using active appearance models and support vector machine regression. In BTAS (pp. 1–5). IEEE.Google Scholar
  36. Luu, K., Bui, T. D., Suen, C. Y., & Ricanek, K. (2010). Spectral regression based age determination. In Computer vision and pattern recognition workshops (CVPRW). IEEE.Google Scholar
  37. Luu, K., Bui, T. D., Suen, C. Y. (2011a). Kernel spectral regression of perceived age from hybrid facial features. In International conference on automatic face and gesture recognition and workshops (FG). IEEE.Google Scholar
  38. Luu, K., Keshav Seshadri, M. S., Bui, T. D., & Suen, C. Y. (2011b). Contourlet appearance model for facial age estimation. In International joint conference on biometrics (IJCB). IEEE.Google Scholar
  39. Martınez, A., & Benavente, R. (1998). The AR face database. Rapport technique 24.Google Scholar
  40. Matthews, I., & Baker, S. (2004). Active appearance models revisited. International Journal of Computer Vision, 60(2), 135–164.CrossRefGoogle Scholar
  41. Alabort-i Medina, J., & Zafeiriou, S. (2014). Bayesian active appearance models. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 3438–3445).Google Scholar
  42. Alabort-i Medina, J., Zafeiriou, S. (2015). Unifying holistic and parts-based deformable model fitting. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 3679–3688).Google Scholar
  43. Alabort-i Medina, J., & Zafeiriou, S. (2017). A unified framework for compositional fitting of active appearance models. International Journal of Computer Vision, 121(1), 26–64.CrossRefGoogle Scholar
  44. Alabort-i Medina, J., Antonakos, E., Booth, J., Snape, P., & Zafeiriou, S. (2014). Menpo: A comprehensive platform for parametric image alignment and visual deformable models. In: Proceedings of the 22nd ACM international conference on Multimedia (pp. 679–682). ACM.Google Scholar
  45. Alabort-i Medina, J., & Zafeiriou, S. (2014). Bayesian active appearance models. In CVPR (pp. 3438–3445). IEEE.Google Scholar
  46. Mollahosseini, A., & Mahoor, M. H. (2013). Bidirectional warping of active appearance model. In CVPRW (pp. 875–880). IEEE.Google Scholar
  47. Navarathna, R., Sridharan, S., & Lucey, S. (2011). Fourier active appearance models. In ICCV (pp. 1919–1926). IEEE.Google Scholar
  48. Ngiam, J., Khosla, A., Kim, M., Nam, J., Lee, H., & Ng, A. Y. (2011). Multimodal deep learning. In ICML (pp. 689–696).Google Scholar
  49. Papandreou, G., & Maragos, P. (2008). Adaptive and constrained algorithms for inverse compositional active appearance model fitting. In CVPR (pp. 1–8). IEEE.Google Scholar
  50. Pizarro, D., Peyras, J., & Bartoli, A. (2008). Light-invariant fitting of active appearance models. In CVPR (pp. 1–6). IEEE.Google Scholar
  51. Sagonas, C., Tzimiropoulos, G., Zafeiriou, S., & Pantic, M. (2013). A semi-automatic methodology for facial landmark annotation. In CVPRW (pp. 896–903). IEEE.Google Scholar
  52. Sagonas, C., Panagakis, Y., Zafeiriou, S., & Pantic, M. (2015). Robust statistical face frontalization. In Proceedings of the IEEE international conference on computer vision (pp. 3871–3879).Google Scholar
  53. Salakhutdinov, R., Hinton, G. E. (2009). Deep boltzmann machines. In International conference on artificial intelligence and statistics (pp. 448–455).Google Scholar
  54. Salakhutdinov, R. R. (2009). Learning in Markov random fields using tempered transitions. In NIPS (pp. 1598–1606).Google Scholar
  55. Saragih, J., & Goecke, R. (2007). A nonlinear discriminative approach to aam fitting. In ICCV (pp. 1–8). IEEE.Google Scholar
  56. Srivastava, N., & Salakhutdinov, R. (2012). Multimodal learning with deep boltzmann machines. In NIPS (pp. 2222–2230).Google Scholar
  57. Sun, Y., Wang, X., & Tang, X. (2013). Deep convolutional network cascade for facial point detection. In CVPR (pp. 3476–3483).Google Scholar
  58. Sun, Y., Wang, X., & Tang, X. (2014). Deep learning face representation from predicting 10,000 classes. In CVPR (pp 1891–1898).Google Scholar
  59. Sung, J., & Kim, D. (2008). Pose-robust facial expression recognition using view-based 2D + 3D AAM. IEEE Transactions on Systems, Man, and Cybernetics-Part A: Systems and Humans, 38(4), 852–866.CrossRefGoogle Scholar
  60. Taigman, Y., Yang, M., Ranzato, M., & Wolf, L. (2014). Deepface: Closing the gap to human-level performance in face verification. In CVPR (pp. 1701–1708).Google Scholar
  61. Tang, Y., Salakhutdinov, R., & Hinton, G. (2012a). Deep lambertian networks. In ICML.Google Scholar
  62. Tang, Y., Salakhutdinov, R., & Hinton, G. (2012b). Robust Boltzmann machines for recognition and denoising. In CVPR (pp. 2264–2271). IEEE.Google Scholar
  63. Taylor, G. W., Sigal, L., Fleet, D. J., & Hinton, G. E. (2010). Dynamical binary latent variable models for 3d human pose tracking. In CVPR (pp. 631–638). IEEE.Google Scholar
  64. Tzimiropoulos, G., & Pantic, M. (2013). Optimization problems for fast aam fitting in-the-wild. In ICCV (pp. 593–600). IEEE.Google Scholar
  65. Tzimiropoulos, G., & Pantic, M. (2017). Fast algorithms for fitting active appearance models to unconstrained images. International Journal of Computer Vision, 122(1), 17–33.MathSciNetCrossRefGoogle Scholar
  66. Van Der Maaten, L., & Hendriks, E. (2010). Capturing appearance variation in active appearance models. In CVPRW (pp. 34–41). IEEE.Google Scholar
  67. Wang, B., Feng, X., Gong, L., Feng, H., Hwang, W., & Han, J. J. (2015a). Robust pose normalization for face recognition under varying views. In IEEE international conference on image processing (ICIP) (pp. 1648–1652). IEEE.Google Scholar
  68. Wang, X., Guo, R., & Kambhamettu, C. (2015b). Deeply-learned feature for age estimation. In WACV (pp 534–541). IEEE.Google Scholar
  69. Wang, Z., & Bovik, A. C. (2009). Mean squared error: Love it or leave it? A new look at signal fidelity measures. IEEE Signal Processing Magazine, 26(1), 98–117.CrossRefGoogle Scholar
  70. Wu, Y., Wang, Z., & Ji, Q. (2013). Facial feature tracking under varying facial expressions and face poses based on restricted Boltzmann machines. In CVPR (pp 3452–3459). IEEE.Google Scholar
  71. Xing, J., Niu, Z., Huang, J., Hu, W., & Yan, S. (2014). Towards multi-view and partially-occluded face alignment. In CVPR (pp. 1829–1836).Google Scholar
  72. Yang, C. Y., Liu, S., & Yang, M. H. (2013). Structured face hallucination. In CVPR (pp 1099–1106). IEEE.Google Scholar
  73. Yang, J., Wright, J., Huang, T. S., & Ma, Y. (2010). Image super-resolution via sparse representation. IEEE Transactions on Image Processing, 19(11), 2861–2873.MathSciNetCrossRefzbMATHGoogle Scholar
  74. Yildirim, I., Kulkarni, T. D., Freiwald, W. A., & Tenenbaum, J. B. (2015). Efficient analysis-by-synthesis in vision: A computational framework, behavioral tests, and comparison with neural representations. In CogSci.Google Scholar
  75. Zhai, H., Liu, C., Dong, H., Ji, Y., Guo, Y., & Gong, S. (2015). Face verification across aging based on deep convolutional networks and local binary patterns. In IScIDE (pp. 341–350). Berlin: Springer.Google Scholar
  76. Zhang, K., Zhang, Z., Li, Z., & Qiao, Y. (2016a). Joint face detection and alignment using multitask cascaded convolutional networks. IEEE Signal Processing Letters, 23(10), 1499–1503.CrossRefGoogle Scholar
  77. Zhang, Z., Luo, P., Loy, C. C., & Tang, X. (2016b). Learning deep representation for face alignment with auxiliary attributes. IEEE Transactions on Pattern Analysis and Machine Intelligence, 38(5), 918–930.CrossRefGoogle Scholar
  78. Zhu, C., Zheng, Y., Luu, K., & Savvides, M. (2017). CMS-RCNN: Contextual multi-scale region-based cnn for unconstrained face detection. In Deep learning for biometrics (pp. 57–79). Berlin: Springer.Google Scholar
  79. Zhu, J., Hoi, S. C., & Lyu, M. R. (2006). Real-time non-rigid shape recovery via active appearance models for augmented reality. In ECCV (pp. 186–197). Berlin: Springer.Google Scholar
  80. Zhu, Z., Luo, P., Wang, X., & Tang, X. (2013). Deep learning identity-preserving face space. In CVPR (pp. 113–120).Google Scholar
  81. Zhu, Z., Luo, P., Wang, X., & Tang, X. (2014). Multi-view perceptron: A deep model for learning face identity and view representations. In NIPS (pp. 217–225).Google Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2018

Authors and Affiliations

  1. 1.Computer Science and Software Engineering DepartmentConcordia UniversityMontrealCanada
  2. 2.CyLab Biometrics Center and the Department of Electrical and Computer EngineeringCarnegie Mellon UniversityPittsburghUSA
  3. 3.Computer Science and Computer Engineering DepartmentUniversity of ArkansasFayettevilleUSA

Personalised recommendations