International Journal of Computer Vision

, Volume 29, Issue 2, pp 107–131 | Cite as

Multidimensional Morphable Models: A Framework for Representing and Matching Object Classes

  • Michael J. Jones
  • Tomaso Poggio
Article

Abstract

We describe a flexible model for representing images of objects of a certain class, known a priori, such as faces, and introduce a new algorithm for matching it to a novel image and thereby perform image analysis. The flexible model, known as a multidimensional morphable model, is learned from example images of objects of a class. In this paper we introduce an effective stochastic gradient descent algorithm that automatically matches a model to a novel image. Several experiments demonstrate the robustness and the broad range of applicability of morphable models. Our approach can provide novel solutions to several vision tasks, including the computation of image correspondence, object verification and image compression.

object representations image analysis correspondence object recognition 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Atick, J., Griffin, P., and Redlich, A. 1995. Statistical approach to shape from shading: Reconstruction of 3D face surfaces from single 2D images. Neural Computation.Google Scholar
  2. Bergen, J. and Hingorani, R. 1990. Hierarchical motion-based frame rate conversion. Technical Report, David Sarnoff Research Center.Google Scholar
  3. Besl, P. and Jain, R. 1985. Three-dimensional object recognition. Computing Surveys, 17(1):75-145.CrossRefGoogle Scholar
  4. Beymer, D. 1995a. Vectorizing face images by interleaving shape and texture computations. AI Memo 1537, MIT.Google Scholar
  5. Beymer, D. 1995b. Pose-invariant face recognition using real and virtual views. Ph.D. Thesis, Massachusetts Institute of Technology.Google Scholar
  6. Beymer, D. and Poggio, T. 1995. Face recognition from one example view. AI Memo 1536, MIT.Google Scholar
  7. Beymer, D. and Poggio, T. 1996. Image representations for visual learning. Science, 272:1905-1909.Google Scholar
  8. Beymer, D., Shashua, A., and Poggio, T. 1993. Example based image analysis and synthesis. AI Memo 1431, MIT.Google Scholar
  9. Blake, A. and Isard, M. 1994. 3D position attitude and shape input using video tracking of hands and lips. In Computer Graphics Proceedings, pp. 185-192.Google Scholar
  10. Bulthoff, H., Edelman, S., and Tarr, M. 1995. How are threedimensional objects represented in the brain? Cerebral Cortex, 5(3):247-260.Google Scholar
  11. Burt, P. 1984. The pyramid as a structure for efficient computation. Multiresolution Image Processing and Analysis. Springer-Verlag, pp. 6-37.Google Scholar
  12. Burt, P. and Adelson, E. 1983. The laplacian pyramid as a compact image code. IEEE Transactions on Communications, COM-31(4):532-540.CrossRefGoogle Scholar
  13. Choi, C., Okazaki, T., Harashima, H., and Takebe, T. 1991. A system of analyzing and synthesizing facial images. In Proc. IEEE, pp. 2665-2668.Google Scholar
  14. Cootes, T. and Taylor, C. 1992. Active shape models-smart snakes. In British Machine Vision Conference, pp. 266-275.Google Scholar
  15. Cootes, T. and Taylor, C. 1994. Using grey-level models to improve active shape model search. In International Conference on Pattern Recognition, pp. 63-67.Google Scholar
  16. Cootes, T., Taylor, C., Cooper, D., and Graham, J. 1992. Training models of shape from sets of examples. In British Machine Vision Conference, pp. 9-18.Google Scholar
  17. Cootes, T., Taylor, C., and Lanitis, A. 1994. Multi-resolution search with active shape models. In International Conference on Pattern Recognition, pp. 610-612.Google Scholar
  18. Cootes, T., Taylor, C., Lanitis, A., Cooper, D., and Graham, J. 1993. Building and using flexible models incorporating grey-level information. In ICCV, Berlin, pp. 242-246.Google Scholar
  19. Edelman, S. and Bulthoff, H. 1990. Viewpoint-specific representations in three dimensional object recognition, AI Memo 1239, MIT.Google Scholar
  20. Ezzat, T. 1996. Example-based image analysis and synthesis for images of human faces, Master’s Thesis, Massachusetts Institute of Technology.Google Scholar
  21. Hallinan, P. 1995. A deformable model for the recognition of human faces under arbitrary illumination, Ph.D. Thesis, Harvard University.Google Scholar
  22. Hill, A., Cootes, T., and Taylor, C. 1992. A generic system for image interpretation using flexible templates. In British Machine Vision Conference, pp. 276-285.Google Scholar
  23. Jones, M. and Poggio, T. 1995. Model-based matching of line drawings by linear combinations of prototypes. In Proceedings of the Fifth International Conference on Computer Vision, pp. 531-536.Google Scholar
  24. Jones, M., Sinha, P., Vetter, T., and Poggio, T. 1997. Topdown learning of low-level vision tasks. Current Biology7(12): 991-994.CrossRefGoogle Scholar
  25. Kirby, M. and Sirovich, L. 1990. The application of the Karhunen-Loeve procedure for the characterization of human faces. IEEE Transactions on Pattern Analysis and Machine Intelligence, 12(1):103-108.CrossRefGoogle Scholar
  26. Lanitis, A., Taylor, C., and Cootes, T. 1995. A unified approach to coding and interpreting face images. In ICCV, Cambridge, MA, pp. 368-373.Google Scholar
  27. Logothetis, N., Pauls, J., and Poggio, T. 1995. Shape representation in the inferior temporal cortex of monkeys. Current Biology, 5(5):552-563.CrossRefGoogle Scholar
  28. Murase, H. and Nayar, S. 1995. Visual learning and recognition of 3-D objects from appearance. International Journal of Computer Vision, 14:5-24.Google Scholar
  29. Nastar, C., Moghaddam, B., and Pentland, A. 1996. Generalized image matching. Statistical learning of physically-based deformations, In ECCV, Cambridge, UK.Google Scholar
  30. Pauls, J., Bricolo, E., and Logothetis, N. 1996. Physiological evidence for viewer centered representation in the monkey. In Early Visual Learning, S. Nayar and T. Poggio (Eds.), Oxford University Press.Google Scholar
  31. Poggio, T. 1990. A theory of how the brain might work. AI Memo 1253, MIT.Google Scholar
  32. Poggio, T. and Beymer, D. 1996. Learning to see. IEEE Spectrum, pp. 60-69.Google Scholar
  33. Poggio, T. and Brunelli, R. 1992. A novel approach to graphics. AI Memo 1354, MIT.Google Scholar
  34. Poggio, T. and Vetter, T. 1992. Recognition and structure from one 2D model view: Observations on prototypes, object classes and symmetries. AI Memo 1347, MIT.Google Scholar
  35. Rikert, T. and Jones, M. 1998. Gaze estimation using morphable models. Submitted to the Conference on Face and Gesture Recognition.Google Scholar
  36. Robbins, H. and Munroe, S. 1951. A stochastic approximation method. Annals of Mathematical Statistics, 22:400-407.Google Scholar
  37. Shashua, A. 1992a. Geometry and photometry in 3D visual recognition. Ph.D. Thesis, Massachusetts Institute of Technology.Google Scholar
  38. Shashua, A. 1992b. Projective structure from two uncalibrated images: Structure from motion and recognition. AI Memo 1363, MIT.Google Scholar
  39. Sinha, P. 1995.Perceiving and recognizing 3D forms. Ph.D. Thesis, Massachusetts Institute of Technology.Google Scholar
  40. Sinha, P. and Poggio, T. 1996. Role of learning in three-dimensional form perception. Nature, 384(6608):460-463.CrossRefGoogle Scholar
  41. Troje, N. and Bulthoff, H. 1995. Face recognition under varying pose: The role of texture and shape.Vision Research, 36(12):1761-1771.CrossRefGoogle Scholar
  42. Turk, M. and Pentland, A. 1991. Face recognition using eigenfaces. In IEEE Conference on Computer Vision and Pattern Recognition, pp. 586-591.Google Scholar
  43. Ullman, S. and Basri, R. 1991. Recognition by linear combinations of models. IEEE Transactions on Pattern Analysis and Machine Intelligence, 13:992-1005.CrossRefGoogle Scholar
  44. Vetter, T., Jones, M., and Poggio, T. 1997. Abootstrapping algorithm for learning linearized models of object classes. In IEEE Conference on Computer Vision and Pattern Recognition, pp. 40-46.Google Scholar
  45. Vetter, T. and Poggio, T. 1995. Linear object classes and image synthesis from a single example image. AI Memo 1531, MIT.Google Scholar
  46. Viola, P. 1995. Alignment by maximization of mutual information. Ph.D. Thesis, Massachusetts Institute of Technology.Google Scholar

Copyright information

© Kluwer Academic Publishers 1998

Authors and Affiliations

  • Michael J. Jones
    • 1
  • Tomaso Poggio
    • 2
  1. 1.Cambridge Research LabDigital Equipment Corp., OneCambridge
  2. 2.Artificial Intelligence Lab and The Center for Biological and Computational LearningMassachusetts Institute of TechnologyCambridge

Personalised recommendations