International Journal of Computer Vision

, Volume 75, Issue 2, pp 267–282

POP: Patchwork of Parts Models for Object Recognition



We formulate a deformable template model for objects with an efficient mechanism for computation and parameter estimation. The data consists of binary oriented edge features, robust to photometric variation and small local deformations. The template is defined in terms of probability arrays for each edge type. A primary contribution of this paper is the definition of the instantiation of an object in terms of shifts of a moderate number local submodels—parts—which are subsequently recombined using a patchwork operation, to define a coherent statistical model of the data. Object classes are modeled as mixtures of patchwork of parts POP models that are discovered sequentially as more class data is observed. We define the notion of the support associated to an instantiation, and use this to formulate statistical models for multi-object configurations including possible occlusions. All decisions on the labeling of the objects in the image are based on comparing likelihoods. The combination of a deformable model with an efficient estimation procedure yields competitive results in a variety of applications with very small training sets, without need to train decision boundaries—only data from the class being trained is used. Experiments are presented on the MNIST database, reading zipcodes, and face detection.


deformable models model estimation multi-object configurations object detection 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. Allassonnière, S., Amit, Y., and Trouvé, A. 2006. Toward a coherent statistical framework for dense deformable template estimation. Journal of the Royal Stat. Soc., to appear.Google Scholar
  2. Amit, Y. 2002. 2d Object Detection and Recognition: Models, Algorithms and Networks, MIT Press: Cambridge, MassGoogle Scholar
  3. Amit, Y. and Geman, D. 1997. Shape quantization and recognition with randomized trees. Neural Computation, 9: 1545–1588.CrossRefGoogle Scholar
  4. Amit, Y. and Geman, D. 1999. A computational model for visual selection. Neural Computation, 11: 1691–1715.CrossRefGoogle Scholar
  5. Amit, Y., Geman, D., and Fan, X. D. 2004. A coarse-to-fine strategy for multi-class shape detection. IEEE-PAMI, 26: 1606–1621.Google Scholar
  6. Belongie, S., Malik, J., and Puzicha, S. 2002. Shape matching and object recongition using shape context. IEEE PAMI, 24: 509–523.Google Scholar
  7. Bernstein, E. J. and Amit, Y. 2005. Part- based models for object classification and detection, In CVPR 2005 (2).Google Scholar
  8. Borenstein, E., Sharon, E., and S., U. 2004. Combining bottom up and top down segmentation, In Proceedings CVPRW04, Vol. 4, IEEE.Google Scholar
  9. Burl, M., Weber, M., and Perona, P. 1998. A probabilistic approach to object recognition using local photometry and global geometry, In Proc. of the 5th European Conf. on Computer Vision, ECCV 98, pp. 628–641.Google Scholar
  10. Crandall, D., Felzenszwalb, P., and Huttenlocher, D. 2005. Spatial priors for part-based recognition using statistical models, In Proceedings CVPR 2005 to appear.Google Scholar
  11. Dempster, A. P., Laird, N. M., and Rubin, D. B. 1977. Maximum likelihood from incomplete data via the em algorithm. Journal of the Royal Statistical Society, 1: 1–22.Google Scholar
  12. Fei-Fei, L., Fergus, R., and Perona, P. 2003. A bayesian approach to unsupervised one-shot learning of object categories, In Proceedings of the International Conference on Computer Vision, Vol. 1.Google Scholar
  13. Geman, S., Potter, D. F., and Chi, Z. 2002. Composition systems. Quarterly of Applied Mathematics, LX: 707–736.Google Scholar
  14. Ha, T. M., Zimmermann, M., and Bunke, H. 1998. Off-line handwritten numeral string recognition by combining segmentation-based and segmentation-free methods. Pattern Recognition, 31: 257–272.CrossRefGoogle Scholar
  15. Hastie, T. and Simard, P. Y. 1998. Metrics and models for handwritten character recognition. Statistical Science.Google Scholar
  16. LeCun, Y. 2004. The mnist database.
  17. LeCun, Y., Bottou, L., Bengio, Y., and Haffner, P. 1998. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11): 2278–2324.Google Scholar
  18. Leibe, B. and Schiele, B. 2003. Interleaved object categorization and segmentation, In BMVC’03.Google Scholar
  19. Leung, T., Burl, M., and Perona, P. 1995. Finding faces in cluttered scenes labelled random graph matching, In Proceedings, 5th Int. Conf. on Comp. Vision, pp. 637–644.Google Scholar
  20. liebe04:_scale_invar_objec_categ_using Liebe, B. and Schiele, B. 2004. Scale invariant object categorization using a scale-adaptative mean-shift search, In DAGM’04 Annual Pattern Recognition Symposium, Vol. 3175, pp. 145–153.Google Scholar
  21. Palumbo, P. and Srihari, S. 1996. Postal address reading in real time. Intr. Jour. of Imaging Science and Technology.Google Scholar
  22. Rowley, H. A., Baluja, S., and Kanade, T. 1998. Neural network-based face detection. IEEE Trans. PAMI, 20: 23–38.Google Scholar
  23. Schneiderman, H. and Kanade, T. 2004. Object detection using the statistics of parts. Inter. Jour. Comp. Vis., 56: 151–177.CrossRefGoogle Scholar
  24. Torralba, A., Murphy, K. P., and Freeman, W. T. 2004. Sharing visual features for multiclass and multiview object detection, Technical Report AI-Memo 2004-008, MIT.Google Scholar
  25. Tu, Z. W., Chen, X. R., L., Y. A., and Zhu, S. C. 2004. Image parsing: unifying segmentation, detection and recognition. Int’l J. of Computer Vision, to appear.Google Scholar
  26. Vapnik, V. N. 1995. The Nature of Statistical Learning Theory. Springer Verlag, New York.MATHGoogle Scholar
  27. Viola, P. and Jones, M. J. 2004. Robust real time face detection. Intl. Jour. Comp. Vis., 57: 137–154.CrossRefGoogle Scholar
  28. Wang, S. C. 1998. A statistical model for computer recognition of sequences of handwritten digits, with applications to zip codes, PhD thesis, University of Chicago.Google Scholar
  29. Wiskott, L., Fellous, J.-M., Kruger, N., and von der Marlsburg, C. 1997. Face recognition by elastic bunch graph matching. IEEE Trans. on Patt. Anal. and Mach. Intel., 7: 775–779.CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC 2007

Authors and Affiliations

  1. 1.Department of Statistics and the Department of Computer ScienceUniversity of ChicagoChicagoUSA
  2. 2.CMLA at the Ecole Normale Superieur, CachanCachan CedexFrance

Personalised recommendations