Learning Montages of Transformed Latent Images as Representations of Objects That Change in Appearance

  • Chris Pal
  • Brendan J. Frey
  • Nebojsa Jojic
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 2353)


This paper introduces a novel probabilistic model for representing objects that change in appearance as a result of changes in pose, due to small deformations of their sub-parts and the relative spatial transformation of sub-parts of the object. We call the model a probabilistic montage. The model is based upon the idea that an image can be represented as a montage using many, small transformed and cropped patches from a collection of latent images. The approach is similar to that which might be employed by a police artist who might represent an image of a criminal suspect’s face using a montage of face parts cut out of a ”library” of face parts. In contrast, for our model, we learn the library of small latent images from a set of examples of objects that are changing in shape. In our approach, first the image is divided into a grid of sub-images. Each sub-image in the grid acts as window that crops a piece out of one of a collection of slightly larger images possible for that location in the image. We illustrate various probability models that can be used to encode the appropriate relationships for latent images and cropping transformations among the different patches. In this paper we present the complete algorithm for a tree-structured model. We show how the approach and model are able to find representations of the appearance of full body images of people in motion. We show how our approach can be used to learn representations of objects in an ”unsupervised” manner and present results using our model for recognition and tracking purposes in a ”supervised” manner.


Bayesian Network Latent Image Expectation Maximization Algorithm Coarse Scale Montage Model 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    B. J. Frey and N. Jojic, “Estimating mixture models of images and inferring spatial transformations using the em algorithm,” Proc. IEEE Computer Vision and Pattern Recognition (CVPR), June 1999.Google Scholar
  2. 2.
    M. J. Black and Y. Yacoob, “Tracking and recognizing rigid and non-rigid facial motions using local parametric models of image motions,” Proc. International Conference on Computer Vision, pp. 374–381, 1995.Google Scholar
  3. 3.
    M. J. Black S. Ju and Y. Yacoob, “Cardboard people: A parameterized model of articulated image motion,” Proc. International Conference on Face and Gesture Recognition, pp. 38–44, 1996.Google Scholar
  4. 4.
    A. Blake and M. Isard, Active Contours, Springer-Verlag, 1998.Google Scholar
  5. 5.
    G.J. Edwards T.F. Cootes and C.J. Taylor, “Active appearance models,” Proc. European Conference on Computer Vision, vol. 2, pp. 484–498, Springer, 1998.Google Scholar
  6. 6.
    C. Bregler, “Learning and recognizing human dynamics in video sequences,” Proc. IEEE (CVPR), June 1997.Google Scholar
  7. 7.
    R. Rosales, M. Siddiqui, J. Alon, and S. Sclaroff, “Estimating 3d body pose using uncalibrated cameras,” Proc. IEEE (CVPR), 2001.Google Scholar
  8. 8.
    C.J. Taylor, “Reconstruction of articulated objects from point correspondences in a single uncalibrated image,” Proc. Computer Vision and Image Understanding (CVIU), pp. 80:349–363, 2000.zbMATHCrossRefGoogle Scholar
  9. 9.
    H. Lee and Z. Chen, “Determination of 3d human body postures from a single view,” Computer Vision Graphics and Image Processing (CVGIP), pp. 30:148–168, 1985.MathSciNetCrossRefGoogle Scholar
  10. 10.
    J. Pearl, Probabilistic Inference in Intelligent Systems, Morgan Kaufmann, San Mateo, California, 1988.Google Scholar
  11. 11.
    Y. Bengio Y. LeCun, L. Bottou and P. Haffner, “Gradient-based learning applied to document recognition,” Proceedings of the IEEE, vol. 86, no. 11, pp. 2278–2324, November 1998.CrossRefGoogle Scholar
  12. 12.
    K. Fukushima, “Neocognitron: A hierarchical neural network capable of visual pattern recognition,” Neural Networks, vol. 1, pp. 119–130, 1988.CrossRefGoogle Scholar
  13. 13.
    M. Jordan, Learning in Graphical Models, Kluwer, Dordrecht, 1998.zbMATHCrossRefGoogle Scholar
  14. 14.
    S. Geman and D. Geman, “Stochastic relaxation, gibbs distributions, and the bayesian restoration of images,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. PAMI-6, no. 6, November 1984.Google Scholar
  15. 15.
    A. Finkelstein and M. Range, “Image mosaics,” Proc. EP’98 and RIDT’98, St. Malo, France, vol. 15, no. 10, pp. 1042–1052, March 1998.Google Scholar
  16. 16.
    R. Silvers and M. Hawley, Photomosaics, New York: Henry Holt and Company, 1997.Google Scholar
  17. 17.
    K. Knowlton and L. Harmon, “Computer-produced grey scales,” Computer Graphics and Image Processing, vol. 1, pp. 1–20, 1972.CrossRefGoogle Scholar
  18. 18.
    N. Friedman, “The bayesian structural em algorithm,” Fourteenth Conf. on Uncertainty in Artificial Intelligence (UAI), 1998.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2002

Authors and Affiliations

  • Chris Pal
    • 1
  • Brendan J. Frey
    • 2
  • Nebojsa Jojic
    • 3
  1. 1.Dept. Computer ScienceUniversity of WaterlooWaterlooCanada
  2. 2.Dept. Electrical and Computer EngineeringUniversity of TorontoTorontoCanada
  3. 3.Microsoft ResearchRedmondUSA

Personalised recommendations