Markov Chain Monte Carlo for Automated Face Image Analysis

Schönborn, Sandro; Egger, Bernhard; Morel-Forster, Andreas; Vetter, Thomas

doi:10.1007/s11263-016-0967-5

Markov Chain Monte Carlo for Automated Face Image Analysis

Published: 02 November 2016

Volume 123, pages 160–183, (2017)
Cite this article

International Journal of Computer Vision Aims and scope Submit manuscript

Sandro Schönborn¹,
Bernhard Egger¹,
Andreas Morel-Forster¹ &
…
Thomas Vetter¹

2232 Accesses
44 Citations
Explore all metrics

Abstract

We present a novel fully probabilistic method to interpret a single face image with the 3D Morphable Model. The new method is based on Bayesian inference and makes use of unreliable image-based information. Rather than searching a single optimal solution, we infer the posterior distribution of the model parameters given the target image. The method is a stochastic sampling algorithm with a propose-and-verify architecture based on the Metropolis–Hastings algorithm. The stochastic method can robustly integrate unreliable information and therefore does not rely on feed-forward initialization. The integrative concept is based on two ideas, a separation of proposal moves and their verification with the model (Data-Driven Markov Chain Monte Carlo), and filtering with the Metropolis acceptance rule. It does not need gradients and is less prone to local optima than standard fitters. We also introduce a new collective likelihood which models the average difference between the model and the target image rather than individual pixel differences. The average value shows a natural tendency towards a normal distribution, even when the individual pixel-wise difference is not Gaussian. We employ the new fitting method to calculate posterior models of 3D face reconstructions from single real-world images. A direct application of the algorithm with the 3D Morphable Model leads us to a fully automatic face recognition system with competitive performance on the Multi-PIE database without any database adaptation.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

The Chicago face database: A free stimulus set of faces and norming data

Article 13 January 2015

LSD-SLAM: Large-Scale Direct Monocular SLAM

Real-time face alignment: evaluation methods, training strategies and implementation optimization

Article Open access 26 April 2021

References

Albrecht, T., Lüthi, M., Gerig, T., & Vetter, T. (2013). Posterior shape models. Medical Image Analysis, 17(8), 959–973. doi:10.1016/j.media.2013.05.010.
Article Google Scholar
Aldrian, O., & Smith, W. (2013). Inverse rendering of faces with a 3D morphable model. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(5), 1080–1093. doi:10.1109/TPAMI.2012.206.
Article Google Scholar
Basri, R., & Jacobs, D. W. (2003). Lambertian reflectance and linear subspaces. IEEE Transactions on Pattern Analysis and Machine Intelligence, 25(2), 218–233.
Article Google Scholar
Blanz, V., & Vetter, T. (1999). A morphable model for the synthesis of 3d faces. In SIGGRAPH ’99: Proceedings of the 26th annual conference on computer graphics and interactive techniques (pp. 187–194). New York: ACM Press/Addison-Wesley. doi:10.1145/311535.311556.
Blanz, V., & Vetter, T. (2003). Face recognition based on fitting a 3D morphable model. IEEE Transactions on Pattern Analysis and Machine Intelligence, 25(9), 1063–1074. doi:10.1109/TPAMI.2003.1227983.
Article Google Scholar
Breiman, L. (2001). Random forests. Machine Learning, 45(1), 5–32. doi:10.1023/A:1010933404324.
Article MATH Google Scholar
Chib, S., & Greenberg, E. (1995). Understanding the Metropolis–Hastings algorithm. The American Statistician, 49(4), 327–335.
Google Scholar
Cootes, T., Edwards, G., & Taylor, C. (2001). Active appearance models. IEEE Transactions on Pattern Analysis and Machine Intelligence, 23(6), 681–685. doi:10.1109/34.927467.
Article Google Scholar
Duane, S., Kennedy, A. D., Pendleton, B. J., & Roweth, D. (1987). Hybrid Monte Carlo. Physics Letters B, 195(2), 216–222.
Article Google Scholar
Eckhardt, M., Fasel, I., & Movellan, J. (2009). Towards practical facial feature detection. International Journal of Pattern Recognition and Artificial Intelligence, 23(03), 379–400.
Article Google Scholar
Everingham, M., Gool, L. V., Williams, C. K. I., Winn, J., & Zisserman, A. (2009). The pascal visual object classes (VOC) challenge. International Journal of Computer Vision, 88(2), 303–338. doi:10.1007/s11263-009-0275-4.
Article Google Scholar
Felzenszwalb, P. F., & Huttenlocher, D. P. (2012). Distance transforms of sampled functions. Theory of Computing, 8(1), 415–428. doi:10.4086/toc.2012.v008a019.
Article MathSciNet MATH Google Scholar
Gilks, W. R., Richardson, S., & Spiegelhalter, D. J. (1996). Markov chain Monte Carlo in practice (Vol. 2). Boca Raton, FL: CRC Press.
MATH Google Scholar
Gonick, L., & Smith, W. (1993). Cartoon guide to statistics. New York: HarperCollins.
Google Scholar
Gross, R., Matthews, I., Cohn, J., Kanade, T., & Baker, S. (2010). Multi-PIE. Image and Vision Computing, 28(5), 807–813. doi:10.1016/j.imavis.2009.08.002.
Article Google Scholar
Hastings, W. K. (1970). Monte Carlo sampling methods using Markov chains and their applications. Biometrika, 57(1), 97–109. doi:10.1093/biomet/57.1.97.
Article MathSciNet MATH Google Scholar
Huang, G.B., Ramesh, M., Berg, T., & Learned-Miller, E. (2007). Labeled faces in the wild: A database for studying face recognition in unconstrained environments. Tech. Rep. 07-49, University of Massachusetts, Amherst.
Jampani, V., Nowozin, S., Loper, M., & Gehler, P. V. (2015). The informed sampler: A discriminative approach to bayesian inference in generative computer vision models. Computer Vision and Image Understanding, 136, 32–44. doi:10.1016/j.cviu.2015.03.002.
Article Google Scholar
Kirby, M., & Sirovich, L. (1990). Application of the Karhunen–Loeve procedure for the characterization of human faces. IEEE Transactions on Pattern Analysis and Machine Intelligence, 12(1), 103–108.
Article Google Scholar
Köstinger, M., Wohlhart, P., Roth, P. M., & Bischof, H. (2011). Annotated facial landmarks in the wild: A large-scale, real-world database for facial landmark localization. In 2011 IEEE international conference on computer vision workshops (ICCV workshops) (pp. 2144–2151).
Kulkarni, T. D., Kohli, P., Tenenbaum, J. B., & Mansinghka, V. (2015). Picture: A probabilistic programming language for scene perception. In The IEEE conference on computer vision and pattern recognition (CVPR).
Liu, C., Shum, H. Y., & Zhang, C. (2002). Hierarchical shape modeling for automatic face localization. In Computer Vision—ECCV 2002 (pp. 687–703). Heidelberg: Springer.
Liu, J. S., Liang, F., & Wong, W. H. (2000). The multiple-try method and local optimization in Metropolis sampling. Journal of the American Statistical Association, 95(449), 121–134.
Article MathSciNet MATH Google Scholar
Lowe, D. G. (2004). Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, 60(2), 91–110.
Article Google Scholar
Lüthi, M., Blanc, R., Albrecht, T., Gass, T., Goksel, O., Buchler, P., et al. (2012). Statismo—A framework for PCA based statistical models. The Insight Journal, 1, 1–18.
Matthews, I., & Baker, S. (2004). Active appearance models revisited. International Journal of Computer Vision, 60(2), 135–164.
Article Google Scholar
Metropolis, N., Rosenbluth, A. W., Rosenbluth, M. N., Teller, A. H., & Teller, E. (1953). Equation of state calculations by fast computing machines. The Journal of Chemical Physics, 21, 1087.
Article Google Scholar
Paysan, P., Knothe, R., Amberg, B., Romdhani, S., & Vetter, T. (2009). A 3D face model for pose and illumination invariant face recognition. In Advanced video and signal based surveillance, 2009 (pp. 296–301).
Perlin, K. (1985). An image synthesizer. ACM SIGGRAPH Computer Graphics, 19(3), 287–296.
Article Google Scholar
Rauschert, I., & Collins, R. T. (2012). A generative model for simultaneous estimation of human body shape and pixel-level segmentation. In Computer Vision—ECCV 2012 (pp. 704–717). Heidelberg: Springer.
Robert, C. P., & Casella, G. (2004). Monte Carlo statistical methods (Vol. 319). Citeseer.
Romdhani, S., & Vetter, T. (2003). Efficient, robust and accurate fitting of a 3D morphable model. In Proceedings of ninth IEEE international conference on computer vision, 2003 (pp. 59–66).
Romdhani, S., & Vetter, T. (2005). Estimating 3D shape and texture using pixel intensity, edges, specular highlights, texture constraints and a prior. In IEEE Computer Society conference on computer vision and pattern recognition, 2005 (CVPR 2005) (Vol. 2, pp. 986–993). doi:10.1109/CVPR.2005.145.
Sambridge, M., & Mosegaard, K. (2002). Monte Carlo methods in geophysical inverse problems. Reviews of Geophysics, 40(3), 1009. doi:10.1029/2000RG000089.
Article MATH Google Scholar
Schönborn, S., Egger, B., Forster, A., & Vetter, T. (2015). Background modeling for generative image models. Computer Vision and Image Understanding, 136, 117–127. doi:10.1016/j.cviu.2015.01.008.
Schönborn, S., Forster, A., Egger, B., & Vetter, T. (2013). A Monte Carlo strategy to integrate detection and model-based face analysis. In J. Weickert, M. Hein, & B. Schiele (Eds.), Pattern recognition. Lecture notes in computer science (Vol. 8142, pp. 101–110). Berlin: Springer.
Tipping, M. E., & Bishop, C. M. (1999). Probabilistic principal component analysis. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 61(3), 611–622.
Article MathSciNet MATH Google Scholar
Tu, Z., Chen, X., Yuille, A. L., & Zhu, S. C. (2005). Image parsing: Unifying segmentation, detection, and recognition. International Journal of Computer Vision, 63(2), 113–140. doi:10.1007/s11263-005-6642-x.
Article Google Scholar
Turk, M., & Pentland, A. (1991). Eigenfaces for recognition. Journal of Cognitive Neuroscience, 3(1), 71–86.
Article Google Scholar
Wojek, C., Roth, S., Schindler, K., & Schiele, B. (2010). Monocular 3d scene modeling and inference: Understanding multi-object traffic scenes. In Computer Vision—ECCV 2010 (pp. 467–481). Heidelberg: Springer.
Xiong, X., & De La Torre, F. (2013). Supervised descent method and its applications to face alignment. In 2013 IEEE conference on computer vision and pattern recognition (CVPR) (pp. 532–539). doi:10.1109/CVPR.2013.75.
Yin, L., Wei, X., Sun, Y., Wang, J., & Rosato, M. (2006). A 3D facial expression database for facial behavior research. In 7th International conference on automatic face and gesture recognition, 2006 (FGR 2006) (pp. 211–216). doi:10.1109/FGR.2006.6.
Zhu, X., Yan, J., Yi, D., Lei, Z., & Li, S. (2015). Discriminative 3d morphable model fitting. In Proceedings of 11th IEEE international conference on automatic face and gesture recognition FG2015, Ljubljana, Slovenia.
Zivanov, J., Forster, A., Schönborn, S., & Vetter, T. (2013). Human face shape analysis under spherical harmonics illumination considering self occlusion. In 6th International conference on biometrics, ICB-2013, Madrid.

Download references

Author information

Authors and Affiliations

Department of Mathematics and Computer Science, University of Basel, Spiegelgasse 1, 4051, Basel, Switzerland
Sandro Schönborn, Bernhard Egger, Andreas Morel-Forster & Thomas Vetter

Authors

Sandro Schönborn
View author publications
You can also search for this author in PubMed Google Scholar
Bernhard Egger
View author publications
You can also search for this author in PubMed Google Scholar
Andreas Morel-Forster
View author publications
You can also search for this author in PubMed Google Scholar
Thomas Vetter
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sandro Schönborn.

Additional information

Communicated by T.E. Boult.

Appendices

Appendix 1: The Face Model

Face model The matrix $\mathbf {U}$ contains the principal components. The diagonal matrix $\mathbf {D}$ is modified slightly compared to a standard PCA where it would contain the eigenvalues $\lambda _i$ of the covariance matrix $\mathbf {\Sigma }= \mathbf {U} \tilde{\mathbf {D}}^2 \mathbf {U}^T$. We modify $\mathbf {D}$ to correspond with the proposed Maximum-Likelihood estimators from Tipping and Bishop (1999)

$$\begin{aligned} \mathbf {D}^2 = \tilde{\mathbf {D}}^2 - \sigma ^2 \mathbf {I} \end{aligned}$$

(27)

We estimate the missing standard deviation parameter $\sigma $ of the PPCA model as the Root Mean Square (RMS) reconstruction error of 3D faces, for both shape and texture. Reconstructing the 10 BFM out-of-sample faces, we obtain RMS reconstruction errors of ${{\hat{\sigma }}}_S = 0.61$ mm for the shape part and ${{\hat{\sigma }}}_C = 0.047$ for the color. Note that all color values are RGB floating point numbers in the interval [0, 1].

In order to better adapt the model to real images, we adapt only the face, without ears and throat. A rendering of the mean face of the masked model can be found in Fig. 2. We recalculate the statistics using only the restricted face mask to keep the model statistically valid and orthogonal.

All model parts of our software concerning statistical shape modeling are implemented using the Statismo framework (Lüthi et al. 2012).

Scene model The pinhole camera is modeled with focal length f and an offset $\mathbf {o}$ of the principal point within the image plane of size $w \times h$ pixels. The complete 3D-to-2D projection is then

$$\begin{aligned} \mathbf {x}_{2D}&= {\mathcal {P}} \circ T \circ R_Z \circ R_Y \circ R_X \circ \left( \mathbf {x}_{3D} \right) \end{aligned}$$

(28)

$$\begin{aligned} {\mathcal {P}} \left( \mathbf {r} \right)&= \begin{bmatrix} wf r_x / r_z + o_x \\ hf r_y/r_z + o_y \end{bmatrix}. \end{aligned}$$

(29)

Illumination The radiance $p_i$ of a point i on the face surface with normal $\mathbf {n}_i$ and albedo $a_i$ can be expressed using an expansion into real Spherical Harmonics basis functions $Y_{lm}$

$$\begin{aligned} p_i = a_i \sum _{l=0}^2 \sum _{m=-l}^l Y_{lm}(\mathbf {n}_i) L_{lm} k_l. \end{aligned}$$

(30)

The above equation is per color channel. The expansion of the environment map is captured in the illumination parameters $L_{lm}$, whereas the expansion of the Lambert reflectance kernel is given by $k_l$. For details, including the coefficient values $k_l$ of the expansion, refer to Basri and Jacobs (2003).

The final image is produced by rasterization of all triangles in the face model. We use a Phong shading approach with a varying, interpolated normal for each pixel.

Illumination estimation As the light model (30) is linear for a given geometry, the illumination expansion coefficients $L_{lm}^c$ for each color channel c are estimated solving a linear system (least squares) with entries for each vertex i

$$\begin{aligned} \sum _{l'=1}^9 Y_{l'}(\mathbf {n}_i) k_{l'} a_i^c L_{l'}^c = p_i^c. \end{aligned}$$

(31)

We solve the above system on 1000 randomly selected visible vertices i.

Product likelihood normalization The distribution is centered at the color value of the synthetic image and normalized to account for the truncation due to limited intensity values through

$$\begin{aligned} N= \int _{0}^{1} \exp \left( -\frac{\left||t - I_i(\theta ) \right||^2}{2 \sigma ^2} \right) dt_R dt_G dt_B \end{aligned}$$

(32)

which can be calculated using the error function. The normalization can also be replaced by a standard Gaussian normalization as an approximation if the standard deviation is much smaller than the range of bounds and the color channels are neither saturated nor zero. This is a valid assumption for the majority of typical face images.

Appendix 2: Random Walk Proposals

The standard random walk proposal type is a Gaussian update:

$$\begin{aligned} Q:~~\theta '&= \theta + d \qquad d \sim {\mathcal {N}}(d|0, \sigma ). \end{aligned}$$

(33)

Camera model The proposals change three Euler angles of rotation, three directions of translation, the principal point in the image plane and the focal length. All of these are updated independently, only one at a time, using a selected variance for each. Additionally, 3D rotation proposals are compensated for unwanted movements of the face within the image plane such that it is kept at a fixed position in the image.

Table 11 Random walk proposals: $\sigma $ is the standard deviation of the normal distribution, centered at the current location. $\lambda $ designates mixture coefficients of the different scales coarse (C), intermediate (I) and fine (F). The values are obtained empirically

Full size table

Face model The updates of the 3DMM’s shape and texture models consist of two types of parameter variations. First, there is the addition of uncorrelated Gaussian noise to all parameters. Second, there is a scaling of the total parameter vector length with a log-normal distribution (distance from mean, “caricature”). Proposals are generated by

$$\begin{aligned} Q_S:~~{\varvec{\theta }}'_{\text {S}}&= {\varvec{\theta }}_{\text {S}} + d \qquad d \sim {\mathcal {N}}(d|0, \sigma _{\text {S}}\mathbf {I}) \end{aligned}$$

(34)

$$\begin{aligned} {\varvec{\theta }}'_{\text {S}}&= {\varvec{\theta }}_{\text {S}} \times \lambda \qquad \lambda \sim \log {\mathcal {N}}(1, \sigma _{\text {SL}}). \end{aligned}$$

(35)

Ilumination The illumination coefficients are updated with a mixture of a perturbation, an intensity and a color proposal. The perturbation is a standard independent Gaussian acting on all coefficients at once. The intensity proposal scales all coefficients by a factor drawn from a log-normal distribution and the color proposal keeps the intensity constant while perturbing the coefficients.

Table 11 contains a detailed overview.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Schönborn, S., Egger, B., Morel-Forster, A. et al. Markov Chain Monte Carlo for Automated Face Image Analysis. Int J Comput Vis 123, 160–183 (2017). https://doi.org/10.1007/s11263-016-0967-5

Download citation

Received: 07 July 2015
Accepted: 17 October 2016
Published: 02 November 2016
Issue Date: June 2017
DOI: https://doi.org/10.1007/s11263-016-0967-5

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Markov Chain Monte Carlo for Automated Face Image Analysis

Abstract

Access this article

Similar content being viewed by others

The Chicago face database: A free stimulus set of faces and norming data

LSD-SLAM: Large-Scale Direct Monocular SLAM

Real-time face alignment: evaluation methods, training strategies and implementation optimization

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Appendices

Appendix 1: The Face Model

Appendix 2: Random Walk Proposals

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Markov Chain Monte Carlo for Automated Face Image Analysis

Abstract

Access this article

Similar content being viewed by others

The Chicago face database: A free stimulus set of faces and norming data

LSD-SLAM: Large-Scale Direct Monocular SLAM

Real-time face alignment: evaluation methods, training strategies and implementation optimization

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Appendices

Appendix 1: The Face Model

Appendix 2: Random Walk Proposals

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation