Advertisement

International Journal of Computer Vision

, Volume 76, Issue 2, pp 183–204 | Cite as

Multi-View AAM Fitting and Construction

  • Krishnan Ramnath
  • Seth Koterba
  • Jing Xiao
  • Changbo Hu
  • Iain Matthews
  • Simon Baker
  • Jeffrey Cohn
  • Takeo Kanade
Article

Abstract

Active Appearance Models (AAMs) are generative, parametric models that have been successfully used in the past to model deformable objects such as human faces. The original AAMs formulation was 2D, but they have recently been extended to include a 3D shape model. A variety of single-view algorithms exist for fitting and constructing 3D AAMs but one area that has not been studied is multi-view algorithms. In this paper we present multi-view algorithms for both fitting and constructing 3D AAMs.

Fitting an AAM to an image consists of minimizing the error between the input image and the closest model instance; i.e. solving a nonlinear optimization problem. In the first part of the paper we describe an algorithm for fitting a single AAM to multiple images, captured simultaneously by cameras with arbitrary locations, rotations, and response functions. This algorithm uses the scaled orthographic imaging model used by previous authors, and in the process of fitting computes, or calibrates, the scaled orthographic camera matrices. In the second part of the paper we describe an extension of this algorithm to calibrate weak perspective (or full perspective) camera models for each of the cameras. In essence, we use the human face as a (non-rigid) calibration grid. We demonstrate that the performance of this algorithm is roughly comparable to a standard algorithm using a calibration grid. In the third part of the paper, we show how camera calibration improves the performance of AAM fitting.

A variety of non-rigid structure-from-motion algorithms, both single-view and multi-view, have been proposed that can be used to construct the corresponding 3D non-rigid shape models of a 2D AAM. In the final part of the paper, we show that constructing a 3D face model using non-rigid structure-from-motion suffers from the Bas-Relief ambiguity and may result in a “scaled” (stretched/compressed) model. We outline a robust non-rigid motion-stereo algorithm for calibrated multi-view 3D AAM construction and show how using calibrated multi-view motion-stereo can eliminate the Bas-Relief ambiguity and yield face models with higher 3D fidelity.

Keywords

Active appearance models Multi-view 3D face model construction Multi-view AAM fitting Non-rigid structure-from-motion Motion-stereo Camera calibration 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Supplementary material

11263_2007_50_MOESM1_ESM.avi (5.9 mb)
Video file
11263_2007_50_MOESM2_ESM.avi (7.3 mb)
Video file
11263_2007_50_MOESM3_ESM.avi (17.4 mb)
Video file
11263_2007_50_MOESM4_ESM.avi (17.4 mb)
Video file

Video file

11263_2007_50_MOESM6_ESM.avi (5.5 mb)
Video file

References

  1. Ahlberg, J. (2001). Using the active appearance algorithm for face and facial feature tracking. In Proceedings of the international conference on computer vision workshop on recognition, analysis, and tracking of faces and gestures in real-time systems (pp. 68–72). Google Scholar
  2. Baker, S., & Matthews, I. (2004). Lucas–Kanade 20 years on: a unifying framework. International Journal of Computer Vision, 56(3), 221–255. CrossRefGoogle Scholar
  3. Baker, S., Matthews, I., & Schneider, J. (2004). Automatic construction of active appearance models as an image coding problem. IEEE Transactions on Pattern Analysis and Machine Intelligence, 26(10), 1380–1384. CrossRefGoogle Scholar
  4. Blanz, V., & Vetter, T. (1999). A morphable model for the synthesis of 3D faces. In Computer graphics, annual conference series (SIGGRAPH) (pp. 187–194). Google Scholar
  5. Bouguet, J.-Y. (2005). Camera calibration toolbox for Matlab. http://www.vision.caltech.edu/bouguetj/calib_doc.
  6. Brand, M. (2001). Morphable 3D models from video. In Proceedings of the IEEE computer society conference on computer vision and pattern recognition (Vol. 2, pp. 456–463). Google Scholar
  7. Bregler, C., Hertzmann, A., & Biermann, H. (2000). Recovering non-rigid 3D shape from image streams. In Proceedings of the IEEE computer society conference on computer vision and pattern recognition (pp. 690–696). Google Scholar
  8. Cootes, T., & Kittipanyangam, P. (2002). Comparing variations on the active appearance model algorithm. In Proceedings of the British machine vision conference (Vol. 2, pp. 837–846). Google Scholar
  9. Cootes, T., Di Mauro, E., Taylor, C., & Lanitis, A. (1996). Flexible 3D models from uncalibrated cameras. Image and Vision Computing, 14, 581–587. CrossRefGoogle Scholar
  10. Cootes, T., Edwards, G., & Taylor, C. (1998a). Active appearance models. In Proceedings of the European conference on computer vision (Vol. 2, pp.  484–498). Google Scholar
  11. Cootes, T., Edwards, G., & Taylor, C. (1998b). A comparative evaluation of active appearance model algorithms. In Proceedings of the British machine vision conference (Vol. 2, pp. 680–689). Google Scholar
  12. Cootes, T., Wheeler, G., Walker, K., & Taylor, C. (2000). Coupled-view active appearance models. In Proceedings of the British machine vision conference (Vol. 1, pp. 52–61). Google Scholar
  13. Cootes, T., Edwards, G., & Taylor, C. (2001). Active appearance models. IEEE Transactions on Pattern Analysis and Machine Intelligence, 23(6), 681–685. CrossRefGoogle Scholar
  14. Dornaika, F., & Ahlberg, J. (2004). Fast and reliable active appearance model search for 3D face tracking. In Proceedings of the IEEE transactions on systems, man and cybernetics (Vol. 34, pp. 1838–1853). Google Scholar
  15. Edwards, G. J. (1999). Learning to identify faces in images and video sequences. PhD thesis, University of Manchester, Division of Imaging Science and Biomedical Engineering. Google Scholar
  16. Gokturk, S., Bouguet, J., & Grzeszczuk, R. (2001). A data driven model for monocular face tracking. In Proceedings of the international conference on computer vision (pp. 701–708). Google Scholar
  17. Gross, R., Matthews, I., & Baker, S. (2004). Appearance-based face recognition and light-fields. IEEE Transactions on Pattern Analysis and Machine Intelligence, 26(4), 449–465. CrossRefGoogle Scholar
  18. Gross, R., Matthews, I., & Baker, S. (2006). Active appearance models with occlusion. Image and Vision Computing, 24(6), 593–604. CrossRefGoogle Scholar
  19. Hager, G., & Belhumeur, P. (1998). Efficient region tracking with parametric models of geometry and illumination. IEEE Transactions on Pattern Analysis and Machine Intelligence, 20, 1025–1039. CrossRefGoogle Scholar
  20. Hartley, R. (1995). In defense of the 8-point algorithm. In Proceedings of the international conference on computer vision (pp. 1064–1070). Google Scholar
  21. Hartley, R., & Zisserman, A. (2000). Multiple view geometry in computer vision. Cambridge: Cambridge University Press. zbMATHGoogle Scholar
  22. Hu, C., Xiao, J., Matthews, I., Baker, S., Cohn, J., & Kanade, T. (2004). Fitting a single active appearance model simultaneously to multiple images. In Proceedings of the British machine vision conference (pp. 437–446). Google Scholar
  23. Jones, M., & Poggio, T. (1998). Multidimensional morphable models: a framework for representing and matching object classes. In Proceedings of the international conference on computer vision (pp. 683–688). Google Scholar
  24. Koterba, S., Baker, S., Matthews, I., Hu, C., Xiao, J., Cohn, J., & Kanade, T. (2005). Multi-view AAM fitting and camera calibration. In Proceedings of the international conference on computer vision (pp. 511–518). Google Scholar
  25. Matthews, I., & Baker, S. (2004). Active Appearance Models revisited. International Journal of Computer Vision, 60(2), 135–164. Also appeared as Carnegie Mellon University Robotics Institute Technical Report CMU-RI-TR-03-02. CrossRefGoogle Scholar
  26. Matthews, I., Xiao, J., & Baker, S. (2007). 2D vs 3D deformable face models: representational power, construction, and real-time fitting. International Journal of Computer Vision.  10.1007/s11263-007-0043-2.
  27. Pighin, F. H., Szeliski, R., & Salesin, D. (1999). Resynthesizing facial animation through 3d model-based tracking. In Proceedings of the international conference on computer vision (pp. 143–150). Google Scholar
  28. Romdhani, S., & Vetter, T. (2003). Efficient, robust and accurate fitting of a 3D morphable model. In Proceedings of the international conference on computer vision (pp. 59–66). Google Scholar
  29. Sclaroff, S., & Isidoro, J. (1998). Active blobs. In Proceedings of the international conference on computer vision (pp. 1146–1153). Google Scholar
  30. Sclaroff, S., & Isidoro, J. (2003). Active blobs: region-based, deformable appearance models. Computer Vision and Image Understanding, 89(2/3), 197–225. zbMATHCrossRefGoogle Scholar
  31. Soatto, S., & Brockett, R. (1998). Optimal structure from motion: local ambiguities and global estimates. In Proceedings of the IEEE computer society conference on computer vision and pattern recognition (pp. 282–288). Google Scholar
  32. Sung, J., & Kim, D. (2004). Extension of AAM with 3D shape model for facial shape tracking. In Proceedings of the IEEE international conference on image processing (Vol. 5, pp. 3363–3366). Google Scholar
  33. Szeliski, R., & Kang, S.-B. (1997). Shape ambiguities in structure from motion. IEEE Transactions on Pattern Analysis and Machine Intelligence, 19(5), 506–512. CrossRefGoogle Scholar
  34. Torresani, L., Yang, D., Alexander, G., & Bregler, C. (2001). Tracking and modeling non-rigid objects with rank constraints. In Proceedings of the IEEE computer society conference on computer vision and pattern recognition (pp.  493–500). Google Scholar
  35. Vetter, T., & Poggio, T. (1997). Linear object classes and image synthesis from a single example image. IEEE Transactions on Pattern Analysis and Machine Intelligence, 19(7), 733–742. CrossRefGoogle Scholar
  36. Waxman, A., & Duncan, J. (1986). Binocular image flows: steps toward stereo-motion fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence, 8(6), 715–729. CrossRefGoogle Scholar
  37. Wen, Z., & Huang, T. S. (2003). Capturing subtle facial motions in 3D face tracking. In Proceedings of the international conference on computer vision (p. 1343). Google Scholar
  38. Xiao, J., & Kanade, T. (2005). Uncalibrated perspective reconstruction of deformable structures. In Proceedings of the international conference on computer vision (pp. 1075–1082). Google Scholar
  39. Xiao, J., Baker, S., Matthews, I., & Kanade, T. (2004a). Real-time combined 2D+3D active appearance models. In Proceedings of the IEEE computer society conference on computer vision and pattern recognition (Vol. 2, pp. 535–542). Google Scholar
  40. Xiao, J., Chai, J., & Kanade, T. (2004b). A closed-form solution to non-rigid shape and motion recovery. In Proceedings of the European conference on computer vision (pp. 573–587). Google Scholar
  41. Zhang, Z., & Faugeras, O. (1992a). 3D dynamic scene analysis. Berlin: Springer. zbMATHGoogle Scholar
  42. Zhang, Z., & Faugeras, O. (1992b). Estimation of displacements from two 3-D frames obtained from stereo. IEEE Transactions on Pattern Analysis and Machine Intelligence, 14(12), 1141–1156. CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC 2007

Authors and Affiliations

  • Krishnan Ramnath
    • 1
  • Seth Koterba
    • 2
  • Jing Xiao
    • 3
  • Changbo Hu
    • 2
  • Iain Matthews
    • 2
  • Simon Baker
    • 4
  • Jeffrey Cohn
    • 5
  • Takeo Kanade
    • 2
  1. 1.Objectvideo Inc.RestonUSA
  2. 2.The Robotics InstituteCarnegie Mellon UniversityPittsburghUSA
  3. 3.Epson Palo Alto LaboratoryEpson Research & DevelopmentSan JoseUSA
  4. 4.Microsoft ResearchMicrosoft CorporationRedmondUSA
  5. 5.Department of PsychologyUniversity of PittsburghPittsburghUSA

Personalised recommendations