POP: Patchwork of Parts Models for Object Recognition
- First Online:
- 299 Downloads
We formulate a deformable template model for objects with an efficient mechanism for computation and parameter estimation. The data consists of binary oriented edge features, robust to photometric variation and small local deformations. The template is defined in terms of probability arrays for each edge type. A primary contribution of this paper is the definition of the instantiation of an object in terms of shifts of a moderate number local submodels—parts—which are subsequently recombined using a patchwork operation, to define a coherent statistical model of the data. Object classes are modeled as mixtures of patchwork of parts POP models that are discovered sequentially as more class data is observed. We define the notion of the support associated to an instantiation, and use this to formulate statistical models for multi-object configurations including possible occlusions. All decisions on the labeling of the objects in the image are based on comparing likelihoods. The combination of a deformable model with an efficient estimation procedure yields competitive results in a variety of applications with very small training sets, without need to train decision boundaries—only data from the class being trained is used. Experiments are presented on the MNIST database, reading zipcodes, and face detection.
Keywordsdeformable models model estimation multi-object configurations object detection
Unable to display preview. Download preview PDF.
- Allassonnière, S., Amit, Y., and Trouvé, A. 2006. Toward a coherent statistical framework for dense deformable template estimation. Journal of the Royal Stat. Soc., to appear.Google Scholar
- Amit, Y. 2002. 2d Object Detection and Recognition: Models, Algorithms and Networks, MIT Press: Cambridge, MassGoogle Scholar
- Amit, Y., Geman, D., and Fan, X. D. 2004. A coarse-to-fine strategy for multi-class shape detection. IEEE-PAMI, 26: 1606–1621.Google Scholar
- Belongie, S., Malik, J., and Puzicha, S. 2002. Shape matching and object recongition using shape context. IEEE PAMI, 24: 509–523.Google Scholar
- Bernstein, E. J. and Amit, Y. 2005. Part- based models for object classification and detection, In CVPR 2005 (2).Google Scholar
- Borenstein, E. 2006. http://www.dam.brown.edu/people/eranb/’.
- Borenstein, E., Sharon, E., and S., U. 2004. Combining bottom up and top down segmentation, In Proceedings CVPRW04, Vol. 4, IEEE.Google Scholar
- Burl, M., Weber, M., and Perona, P. 1998. A probabilistic approach to object recognition using local photometry and global geometry, In Proc. of the 5th European Conf. on Computer Vision, ECCV 98, pp. 628–641.Google Scholar
- Crandall, D., Felzenszwalb, P., and Huttenlocher, D. 2005. Spatial priors for part-based recognition using statistical models, In Proceedings CVPR 2005 to appear.Google Scholar
- Dempster, A. P., Laird, N. M., and Rubin, D. B. 1977. Maximum likelihood from incomplete data via the em algorithm. Journal of the Royal Statistical Society, 1: 1–22.Google Scholar
- Fei-Fei, L., Fergus, R., and Perona, P. 2003. A bayesian approach to unsupervised one-shot learning of object categories, In Proceedings of the International Conference on Computer Vision, Vol. 1.Google Scholar
- Geman, S., Potter, D. F., and Chi, Z. 2002. Composition systems. Quarterly of Applied Mathematics, LX: 707–736.Google Scholar
- Hastie, T. and Simard, P. Y. 1998. Metrics and models for handwritten character recognition. Statistical Science.Google Scholar
- LeCun, Y. 2004. The mnist database. http://yann.lecun.com/exdb/mnist/.
- LeCun, Y., Bottou, L., Bengio, Y., and Haffner, P. 1998. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11): 2278–2324.Google Scholar
- Leibe, B. and Schiele, B. 2003. Interleaved object categorization and segmentation, In BMVC’03.Google Scholar
- Leung, T., Burl, M., and Perona, P. 1995. Finding faces in cluttered scenes labelled random graph matching, In Proceedings, 5th Int. Conf. on Comp. Vision, pp. 637–644.Google Scholar
- liebe04:_scale_invar_objec_categ_using Liebe, B. and Schiele, B. 2004. Scale invariant object categorization using a scale-adaptative mean-shift search, In DAGM’04 Annual Pattern Recognition Symposium, Vol. 3175, pp. 145–153.Google Scholar
- Palumbo, P. and Srihari, S. 1996. Postal address reading in real time. Intr. Jour. of Imaging Science and Technology.Google Scholar
- Rowley, H. A., Baluja, S., and Kanade, T. 1998. Neural network-based face detection. IEEE Trans. PAMI, 20: 23–38.Google Scholar
- Torralba, A., Murphy, K. P., and Freeman, W. T. 2004. Sharing visual features for multiclass and multiview object detection, Technical Report AI-Memo 2004-008, MIT.Google Scholar
- Tu, Z. W., Chen, X. R., L., Y. A., and Zhu, S. C. 2004. Image parsing: unifying segmentation, detection and recognition. Int’l J. of Computer Vision, to appear.Google Scholar
- Wang, S. C. 1998. A statistical model for computer recognition of sequences of handwritten digits, with applications to zip codes, PhD thesis, University of Chicago.Google Scholar