Skip to main content
Log in

Markov Chain Monte Carlo for Automated Face Image Analysis

  • Published:
International Journal of Computer Vision Aims and scope Submit manuscript

Abstract

We present a novel fully probabilistic method to interpret a single face image with the 3D Morphable Model. The new method is based on Bayesian inference and makes use of unreliable image-based information. Rather than searching a single optimal solution, we infer the posterior distribution of the model parameters given the target image. The method is a stochastic sampling algorithm with a propose-and-verify architecture based on the Metropolis–Hastings algorithm. The stochastic method can robustly integrate unreliable information and therefore does not rely on feed-forward initialization. The integrative concept is based on two ideas, a separation of proposal moves and their verification with the model (Data-Driven Markov Chain Monte Carlo), and filtering with the Metropolis acceptance rule. It does not need gradients and is less prone to local optima than standard fitters. We also introduce a new collective likelihood which models the average difference between the model and the target image rather than individual pixel differences. The average value shows a natural tendency towards a normal distribution, even when the individual pixel-wise difference is not Gaussian. We employ the new fitting method to calculate posterior models of 3D face reconstructions from single real-world images. A direct application of the algorithm with the 3D Morphable Model leads us to a fully automatic face recognition system with competitive performance on the Multi-PIE database without any database adaptation.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16

Similar content being viewed by others

References

  • Albrecht, T., Lüthi, M., Gerig, T., & Vetter, T. (2013). Posterior shape models. Medical Image Analysis, 17(8), 959–973. doi:10.1016/j.media.2013.05.010.

    Article  Google Scholar 

  • Aldrian, O., & Smith, W. (2013). Inverse rendering of faces with a 3D morphable model. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(5), 1080–1093. doi:10.1109/TPAMI.2012.206.

    Article  Google Scholar 

  • Basri, R., & Jacobs, D. W. (2003). Lambertian reflectance and linear subspaces. IEEE Transactions on Pattern Analysis and Machine Intelligence, 25(2), 218–233.

    Article  Google Scholar 

  • Blanz, V., & Vetter, T. (1999). A morphable model for the synthesis of 3d faces. In SIGGRAPH ’99: Proceedings of the 26th annual conference on computer graphics and interactive techniques (pp. 187–194). New York: ACM Press/Addison-Wesley. doi:10.1145/311535.311556.

  • Blanz, V., & Vetter, T. (2003). Face recognition based on fitting a 3D morphable model. IEEE Transactions on Pattern Analysis and Machine Intelligence, 25(9), 1063–1074. doi:10.1109/TPAMI.2003.1227983.

    Article  Google Scholar 

  • Breiman, L. (2001). Random forests. Machine Learning, 45(1), 5–32. doi:10.1023/A:1010933404324.

    Article  MATH  Google Scholar 

  • Chib, S., & Greenberg, E. (1995). Understanding the Metropolis–Hastings algorithm. The American Statistician, 49(4), 327–335.

    Google Scholar 

  • Cootes, T., Edwards, G., & Taylor, C. (2001). Active appearance models. IEEE Transactions on Pattern Analysis and Machine Intelligence, 23(6), 681–685. doi:10.1109/34.927467.

    Article  Google Scholar 

  • Duane, S., Kennedy, A. D., Pendleton, B. J., & Roweth, D. (1987). Hybrid Monte Carlo. Physics Letters B, 195(2), 216–222.

    Article  Google Scholar 

  • Eckhardt, M., Fasel, I., & Movellan, J. (2009). Towards practical facial feature detection. International Journal of Pattern Recognition and Artificial Intelligence, 23(03), 379–400.

    Article  Google Scholar 

  • Everingham, M., Gool, L. V., Williams, C. K. I., Winn, J., & Zisserman, A. (2009). The pascal visual object classes (VOC) challenge. International Journal of Computer Vision, 88(2), 303–338. doi:10.1007/s11263-009-0275-4.

    Article  Google Scholar 

  • Felzenszwalb, P. F., & Huttenlocher, D. P. (2012). Distance transforms of sampled functions. Theory of Computing, 8(1), 415–428. doi:10.4086/toc.2012.v008a019.

    Article  MathSciNet  MATH  Google Scholar 

  • Gilks, W. R., Richardson, S., & Spiegelhalter, D. J. (1996). Markov chain Monte Carlo in practice (Vol. 2). Boca Raton, FL: CRC Press.

    MATH  Google Scholar 

  • Gonick, L., & Smith, W. (1993). Cartoon guide to statistics. New York: HarperCollins.

    Google Scholar 

  • Gross, R., Matthews, I., Cohn, J., Kanade, T., & Baker, S. (2010). Multi-PIE. Image and Vision Computing, 28(5), 807–813. doi:10.1016/j.imavis.2009.08.002.

    Article  Google Scholar 

  • Hastings, W. K. (1970). Monte Carlo sampling methods using Markov chains and their applications. Biometrika, 57(1), 97–109. doi:10.1093/biomet/57.1.97.

    Article  MathSciNet  MATH  Google Scholar 

  • Huang, G.B., Ramesh, M., Berg, T., & Learned-Miller, E. (2007). Labeled faces in the wild: A database for studying face recognition in unconstrained environments. Tech. Rep. 07-49, University of Massachusetts, Amherst.

  • Jampani, V., Nowozin, S., Loper, M., & Gehler, P. V. (2015). The informed sampler: A discriminative approach to bayesian inference in generative computer vision models. Computer Vision and Image Understanding, 136, 32–44. doi:10.1016/j.cviu.2015.03.002.

    Article  Google Scholar 

  • Kirby, M., & Sirovich, L. (1990). Application of the Karhunen–Loeve procedure for the characterization of human faces. IEEE Transactions on Pattern Analysis and Machine Intelligence, 12(1), 103–108.

    Article  Google Scholar 

  • Köstinger, M., Wohlhart, P., Roth, P. M., & Bischof, H. (2011). Annotated facial landmarks in the wild: A large-scale, real-world database for facial landmark localization. In 2011 IEEE international conference on computer vision workshops (ICCV workshops) (pp. 2144–2151).

  • Kulkarni, T. D., Kohli, P., Tenenbaum, J. B., & Mansinghka, V. (2015). Picture: A probabilistic programming language for scene perception. In The IEEE conference on computer vision and pattern recognition (CVPR).

  • Liu, C., Shum, H. Y., & Zhang, C. (2002). Hierarchical shape modeling for automatic face localization. In Computer Vision—ECCV 2002 (pp. 687–703). Heidelberg: Springer.

  • Liu, J. S., Liang, F., & Wong, W. H. (2000). The multiple-try method and local optimization in Metropolis sampling. Journal of the American Statistical Association, 95(449), 121–134.

    Article  MathSciNet  MATH  Google Scholar 

  • Lowe, D. G. (2004). Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, 60(2), 91–110.

    Article  Google Scholar 

  • Lüthi, M., Blanc, R., Albrecht, T., Gass, T., Goksel, O., Buchler, P., et al. (2012). Statismo—A framework for PCA based statistical models. The Insight Journal, 1, 1–18.

  • Matthews, I., & Baker, S. (2004). Active appearance models revisited. International Journal of Computer Vision, 60(2), 135–164.

    Article  Google Scholar 

  • Metropolis, N., Rosenbluth, A. W., Rosenbluth, M. N., Teller, A. H., & Teller, E. (1953). Equation of state calculations by fast computing machines. The Journal of Chemical Physics, 21, 1087.

    Article  Google Scholar 

  • Paysan, P., Knothe, R., Amberg, B., Romdhani, S., & Vetter, T. (2009). A 3D face model for pose and illumination invariant face recognition. In Advanced video and signal based surveillance, 2009 (pp. 296–301).

  • Perlin, K. (1985). An image synthesizer. ACM SIGGRAPH Computer Graphics, 19(3), 287–296.

    Article  Google Scholar 

  • Rauschert, I., & Collins, R. T. (2012). A generative model for simultaneous estimation of human body shape and pixel-level segmentation. In Computer Vision—ECCV 2012 (pp. 704–717). Heidelberg: Springer.

  • Robert, C. P., & Casella, G. (2004). Monte Carlo statistical methods (Vol. 319). Citeseer.

  • Romdhani, S., & Vetter, T. (2003). Efficient, robust and accurate fitting of a 3D morphable model. In Proceedings of ninth IEEE international conference on computer vision, 2003 (pp. 59–66).

  • Romdhani, S., & Vetter, T. (2005). Estimating 3D shape and texture using pixel intensity, edges, specular highlights, texture constraints and a prior. In IEEE Computer Society conference on computer vision and pattern recognition, 2005 (CVPR 2005) (Vol. 2, pp. 986–993). doi:10.1109/CVPR.2005.145.

  • Sambridge, M., & Mosegaard, K. (2002). Monte Carlo methods in geophysical inverse problems. Reviews of Geophysics, 40(3), 1009. doi:10.1029/2000RG000089.

    Article  MATH  Google Scholar 

  • Schönborn, S., Egger, B., Forster, A., & Vetter, T. (2015). Background modeling for generative image models. Computer Vision and Image Understanding, 136, 117–127. doi:10.1016/j.cviu.2015.01.008.

  • Schönborn, S., Forster, A., Egger, B., & Vetter, T. (2013). A Monte Carlo strategy to integrate detection and model-based face analysis. In J. Weickert, M. Hein, & B. Schiele (Eds.), Pattern recognition. Lecture notes in computer science (Vol. 8142, pp. 101–110). Berlin: Springer.

  • Tipping, M. E., & Bishop, C. M. (1999). Probabilistic principal component analysis. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 61(3), 611–622.

    Article  MathSciNet  MATH  Google Scholar 

  • Tu, Z., Chen, X., Yuille, A. L., & Zhu, S. C. (2005). Image parsing: Unifying segmentation, detection, and recognition. International Journal of Computer Vision, 63(2), 113–140. doi:10.1007/s11263-005-6642-x.

    Article  Google Scholar 

  • Turk, M., & Pentland, A. (1991). Eigenfaces for recognition. Journal of Cognitive Neuroscience, 3(1), 71–86.

    Article  Google Scholar 

  • Wojek, C., Roth, S., Schindler, K., & Schiele, B. (2010). Monocular 3d scene modeling and inference: Understanding multi-object traffic scenes. In Computer Vision—ECCV 2010 (pp. 467–481). Heidelberg: Springer.

  • Xiong, X., & De La Torre, F. (2013). Supervised descent method and its applications to face alignment. In 2013 IEEE conference on computer vision and pattern recognition (CVPR) (pp. 532–539). doi:10.1109/CVPR.2013.75.

  • Yin, L., Wei, X., Sun, Y., Wang, J., & Rosato, M. (2006). A 3D facial expression database for facial behavior research. In 7th International conference on automatic face and gesture recognition, 2006 (FGR 2006) (pp. 211–216). doi:10.1109/FGR.2006.6.

  • Zhu, X., Yan, J., Yi, D., Lei, Z., & Li, S. (2015). Discriminative 3d morphable model fitting. In Proceedings of 11th IEEE international conference on automatic face and gesture recognition FG2015, Ljubljana, Slovenia.

  • Zivanov, J., Forster, A., Schönborn, S., & Vetter, T. (2013). Human face shape analysis under spherical harmonics illumination considering self occlusion. In 6th International conference on biometrics, ICB-2013, Madrid.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sandro Schönborn.

Additional information

Communicated by T.E. Boult.

Appendices

Appendix 1: The Face Model

Face model The matrix \(\mathbf {U}\) contains the principal components. The diagonal matrix \(\mathbf {D}\) is modified slightly compared to a standard PCA where it would contain the eigenvalues \(\lambda _i\) of the covariance matrix \(\mathbf {\Sigma }= \mathbf {U} \tilde{\mathbf {D}}^2 \mathbf {U}^T\). We modify \(\mathbf {D}\) to correspond with the proposed Maximum-Likelihood estimators from Tipping and Bishop (1999)

$$\begin{aligned} \mathbf {D}^2 = \tilde{\mathbf {D}}^2 - \sigma ^2 \mathbf {I} \end{aligned}$$
(27)

We estimate the missing standard deviation parameter \(\sigma \) of the PPCA model as the Root Mean Square (RMS) reconstruction error of 3D faces, for both shape and texture. Reconstructing the 10 BFM out-of-sample faces, we obtain RMS reconstruction errors of \({{\hat{\sigma }}}_S = 0.61\) mm for the shape part and \({{\hat{\sigma }}}_C = 0.047\) for the color. Note that all color values are RGB floating point numbers in the interval [0, 1].

In order to better adapt the model to real images, we adapt only the face, without ears and throat. A rendering of the mean face of the masked model can be found in Fig. 2. We recalculate the statistics using only the restricted face mask to keep the model statistically valid and orthogonal.

All model parts of our software concerning statistical shape modeling are implemented using the Statismo framework (Lüthi et al. 2012).

Scene model The pinhole camera is modeled with focal length f and an offset \(\mathbf {o}\) of the principal point within the image plane of size \(w \times h\) pixels. The complete 3D-to-2D projection is then

$$\begin{aligned} \mathbf {x}_{2D}&= {\mathcal {P}} \circ T \circ R_Z \circ R_Y \circ R_X \circ \left( \mathbf {x}_{3D} \right) \end{aligned}$$
(28)
$$\begin{aligned} {\mathcal {P}} \left( \mathbf {r} \right)&= \begin{bmatrix} wf r_x / r_z + o_x \\ hf r_y/r_z + o_y \end{bmatrix}. \end{aligned}$$
(29)

Illumination The radiance \(p_i\) of a point i on the face surface with normal \(\mathbf {n}_i\) and albedo \(a_i\) can be expressed using an expansion into real Spherical Harmonics basis functions \(Y_{lm}\)

$$\begin{aligned} p_i = a_i \sum _{l=0}^2 \sum _{m=-l}^l Y_{lm}(\mathbf {n}_i) L_{lm} k_l. \end{aligned}$$
(30)

The above equation is per color channel. The expansion of the environment map is captured in the illumination parameters \(L_{lm}\), whereas the expansion of the Lambert reflectance kernel is given by \(k_l\). For details, including the coefficient values \(k_l\) of the expansion, refer to Basri and Jacobs (2003).

The final image is produced by rasterization of all triangles in the face model. We use a Phong shading approach with a varying, interpolated normal for each pixel.

Illumination estimation As the light model (30) is linear for a given geometry, the illumination expansion coefficients \(L_{lm}^c\) for each color channel c are estimated solving a linear system (least squares) with entries for each vertex i

$$\begin{aligned} \sum _{l'=1}^9 Y_{l'}(\mathbf {n}_i) k_{l'} a_i^c L_{l'}^c = p_i^c. \end{aligned}$$
(31)

We solve the above system on 1000 randomly selected visible vertices i.

Product likelihood normalization The distribution is centered at the color value of the synthetic image and normalized to account for the truncation due to limited intensity values through

$$\begin{aligned} N= \int _{0}^{1} \exp \left( -\frac{\left||t - I_i(\theta ) \right||^2}{2 \sigma ^2} \right) dt_R dt_G dt_B \end{aligned}$$
(32)

which can be calculated using the error function. The normalization can also be replaced by a standard Gaussian normalization as an approximation if the standard deviation is much smaller than the range of bounds and the color channels are neither saturated nor zero. This is a valid assumption for the majority of typical face images.

Appendix 2: Random Walk Proposals

The standard random walk proposal type is a Gaussian update:

$$\begin{aligned} Q:~~\theta '&= \theta + d \qquad d \sim {\mathcal {N}}(d|0, \sigma ). \end{aligned}$$
(33)

Camera model The proposals change three Euler angles of rotation, three directions of translation, the principal point in the image plane and the focal length. All of these are updated independently, only one at a time, using a selected variance for each. Additionally, 3D rotation proposals are compensated for unwanted movements of the face within the image plane such that it is kept at a fixed position in the image.

Table 11 Random walk proposals: \(\sigma \) is the standard deviation of the normal distribution, centered at the current location. \(\lambda \) designates mixture coefficients of the different scales coarse (C), intermediate (I) and fine (F). The values are obtained empirically

Face model The updates of the 3DMM’s shape and texture models consist of two types of parameter variations. First, there is the addition of uncorrelated Gaussian noise to all parameters. Second, there is a scaling of the total parameter vector length with a log-normal distribution (distance from mean, “caricature”). Proposals are generated by

$$\begin{aligned} Q_S:~~{\varvec{\theta }}'_{\text {S}}&= {\varvec{\theta }}_{\text {S}} + d \qquad d \sim {\mathcal {N}}(d|0, \sigma _{\text {S}}\mathbf {I}) \end{aligned}$$
(34)
$$\begin{aligned} {\varvec{\theta }}'_{\text {S}}&= {\varvec{\theta }}_{\text {S}} \times \lambda \qquad \lambda \sim \log {\mathcal {N}}(1, \sigma _{\text {SL}}). \end{aligned}$$
(35)

Ilumination The illumination coefficients are updated with a mixture of a perturbation, an intensity and a color proposal. The perturbation is a standard independent Gaussian acting on all coefficients at once. The intensity proposal scales all coefficients by a factor drawn from a log-normal distribution and the color proposal keeps the intensity constant while perturbing the coefficients.

Table 11 contains a detailed overview.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Schönborn, S., Egger, B., Morel-Forster, A. et al. Markov Chain Monte Carlo for Automated Face Image Analysis. Int J Comput Vis 123, 160–183 (2017). https://doi.org/10.1007/s11263-016-0967-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11263-016-0967-5

Keywords

Navigation