Skip to main content
Log in

Weakly Supervised Scale-Invariant Learning of Models for Visual Recognition

  • Published:
International Journal of Computer Vision Aims and scope Submit manuscript

Abstract

We investigate a method for learning object categories in a weakly supervised manner. Given a set of images known to contain the target category from a similar viewpoint, learning is translation and scale-invariant; does not require alignment or correspondence between the training images, and is robust to clutter and occlusion. Category models are probabilistic constellations of parts, and their parameters are estimated by maximizing the likelihood of the training data. The appearance of the parts, as well as their mutual position, relative scale and probability of detection are explicitly described in the model. Recognition takes place in two stages. First, a feature-finder identifies promising locations for the model”s parts. Second, the category model is used to compare the likelihood that the observed features are generated by the category model, or are generated by background clutter. The flexible nature of the model is demonstrated by results over six diverse object categories including geometrically constrained categories (e.g. faces, cars) and flexible objects (such as animals).

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Agarwal, S., Awan, A., and Roth, D. 2002. Uiuc car dataset. http://l2r.cs.uiuc.edu/~cogcomp/Data/Car/.

  • Agarwal, S. and Roth, D. 2002. Learning a sparse representation for object detection. In Proc. of the European Conference on Computer Vision, pp. 113–130.

  • Amit, Y. and Geman, D. 1999. A computational model for visual selection. Neural Computation, 11(7):1691–1715.

    Article  Google Scholar 

  • Borenstein, E., and Ullman, S. 2002. Class-specific, top-down segmentation. In Proc. of the European Conference on Computer Vision, pp. 109–124.

  • Burl, M., Weber, M., and Perona, P. 1998. A probabilistic approach to object recognition using local photometry and global geometry. In Proc. of the European Conference on Computer Vision, pp. 628–641.

  • Crandall, D., Felzenszwalb, P., and Huttenlocher, D. 2005. Spatial priors for part-based recognition using statistical models. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, San Diego, Vol. 1, pp. 10–17.

  • Csurka, G., Bray, C., Dance, C., and Fan, L. 2004. Visual categorization with bags of keypoints. In Workshop on Statistical Learning in Computer Vision, ECCV, pp. 1–22.

  • Dempster, A., Laird, N., and Rubin, D. 1976. Maximum likelihood from incomplete data via the EM algorithm. JRSS B, 39:1– 38.

    Google Scholar 

  • Fei-Fei, L., Fergus, R. and Perona, P. 2003. A Bayesian approach to unsupervised one-shot learning of object categories. In Proc. of the 9th International Conference on Computer Vision, Nice, France, pp. 1134–1141.

  • Felzenszwalb, P., and Huttenlocher, D. 2000. Pictorial structures for object recognition. In Proc. of the IEEE Conference on Omputer Vision and Pattern Recogniion, pp. 2066–2073.

  • Fergus., R. 2005. Visual Object Category Recognition. PhD thesis, University of Oxford, UK.

  • Fergus, R., and Perona, P. 2003. Caltech object category datasets. http://www.vision.caltech.edu/html-files/archive.html.

  • Fergus, R., Perona, P., and Zisserman, A. 2004. A visual category filter for google images. In Proceedings of the 8th European Conference on Computer Vision, Prague, Czech Republic, Springer-Verlag, pp. 242–256.

  • Fergus, R., Perona, P., and Zisserman, A. 2005. A sparse object category model for efficient learning and exhaustive recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, San Diego, vol. 1, pp. 380–387.

  • Fergus, R., Perona, P., and Zisserman, P. 2003. Object class recognition by unsupervised scale-invariant learning. In Proc. CVPR.

  • Fergus, R., Weber, M. and Perona, P. 2001. Efficient methods for object recognition using the constellation model. Technical report, California Institute of Technology.

  • Forsyth, D.A. and Ponce, J. 2002. Computer Vision: A Modern Approach. Prentice Hall.

  • Grimson, W.E.L., and Lozano-Pérez, T. 1987. Localizing overlapping parts by searching the interpretation tree. IEEE Transactions on Pattern Analysis and Machine Intelligence, 9(4):469–482.

    Article  Google Scholar 

  • Hart, P., Nilsson, N., and Raphael, B. 1968. A formal basis for the determination of minimum cost paths. IEEE Transactions on SSC, 4:100–107.

    Google Scholar 

  • Heisele, B., Serre, T., Pontil, M., Vetter, T., and Poggio, T. 2002. Categorization by learning and combining object parts. In Advances in Neural Information Processing Systems 14, Vancouver, Canada, vol. 2, pp. 1239–1245.

  • Jerrum, M. and Sinclair, A. 1997. The Markov chain Monte Carlo method. In D. S. Hochbaum, (ed.), Approximation Algorithms for NP-hard Problems. PWS Publishing, Boston.

    Google Scholar 

  • Jurie, F. and Schmid, C. 2004. Scale-invariant shape features for recognition of object categories. In Proc. of the IEEE Conference on Computer Vision and Pattern Recognition, Washington, DC, pp. 90–96.

  • Kadir, T. and Brady, M. 2001. Scale, saliency and image description. International Journal of Computer Vision, 45(2):83–105.

    Article  MATH  Google Scholar 

  • Ke, Y. and Sukthankar, R. 2004. PCA-SIFT: A more distinctive representation for local image descriptors. In Proc. of the IEEE Conference on Computer Vision and Pattern Recognition, Wasington, DC.

  • LeCun, Y., Huang, F. and Bottou, L. 2004. Learning methods for generic object recognition with invariance to pose and lighting. In Proc. of the IEEE Conference on Computer Vision and Pattern Recognition, Wasington, DC, IEEE Press.

  • Leibe, B., Leonardis, A. and Schiele, B. 2004. Combined object categorization and segmentation with an implicit shape model. In Workshop on Statistical Learning in Computer Vision, ECCV.

  • Leung, T., Burl, M. and Perona, P. 1998. Probabilistic affine invariants for recognition. In Proc. of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 678–684.

  • Lindeberg., T. 1998. Feature detection with automatic scale selection. International Journal of Computer Vision, 30(2):77–116.

    Google Scholar 

  • Lowe, D.G. 1985. Perceptual Organization and Visual Recognition. Kluwer Academic Publishers.

  • Mardia, K.V. and Dryden, I.L. 1989. Shape distributions for landmark data. Adv. Appl. Prob., 21:742–755.

    Article  MATH  MathSciNet  Google Scholar 

  • Mikolajczyk, K. and Schmid, C. 2001. Indexing based on scale invariant interest points. In Proc. of the 8th International Conference on Computer Vision, Vancouver, Canada, pp. 525–531.

  • Opelt, A., Fussenegger, A., and Auer, P. 2004. Weak hypotheses and boosting for generic object detection and recognition. In Proc. of the 8th International Conference on Computer Vision, Prague, Czech Republic, 2004.

  • Rowley, H., Baluja, S., and Kanade, T. 1998. Neural network-based face detection. IEEE PAMI, 20(1):23–38.

    Google Scholar 

  • Schmid, C. 2001. Constructing models for content-based image retrieval. In Proc. of the IEEE Conference on Computer Vision and Pattern Recognition, vol. 2, pp. 39–45.

  • Schneiderman, H. and Kanade, T. 2000. A statistical approach to 3D object detection applied to faces and cars. In Proc. Computer Vision and Pattern Recognition, pp. 746–751.

  • Sivic, J., Russell, B., Efros, A., Zisserman, A. and Freeman, W. 2005. Discovering object categories in image collections. Technical Report A. I. Memo 2005-005, Massachusetts Institute of Technology.

  • Sung, K. and Poggio, T. 1998. Example-based learning for view-based human face detection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 20(1):39–51.

    Article  Google Scholar 

  • Thureson, J. and Carlsson, S. 2004. Appearance based qualitative image description for object class recognition. In Proc. of the 8th European Conference on Computer Vision, Prague, Czech Republic, pp. 518–529.

  • Torralba, A., Murphy, K.P., and Freeman, W.T. 2004. Sharing features: efficient boosting procedures for multiclass object detection. In Proc. of the 8th European Conference on Computer Vision, Prague, Czech Republic, pp. 762–769.

  • Viola, P. and Jones, M. 2001. Rapid object detection using a boosted cascade of simple features. In Proc. of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 511–518.

  • Weber, M. 2000. Unsupervised learning of models for object recognition. PhD thesis, California Institute of Technology, Pasadena, CA.

  • Weber, M. Einhauser, W. Welling, M., and Perona, P. 2000. Viewpoint-invariant learning and detection of human heads. In Proc. 4th IEEE Int. Conf. Autom. Face and Gesture Recog., FG2000, pp. 20–27.

  • Weber, M., Welling, M. and Perona, P. 2000. Towards automatic discovery of object categories. In Proc. of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 101– 109.

  • Weber, M., Welling, M., and Perona, P. 2000. Unsupervised learning of models for recognition. In Proc. of the European Conference on Computer Vision, pp. 18–32.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to R. Fergus.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Fergus, R., Perona, P. & Zisserman, A. Weakly Supervised Scale-Invariant Learning of Models for Visual Recognition. Int J Comput Vision 71, 273–303 (2007). https://doi.org/10.1007/s11263-006-8707-x

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11263-006-8707-x

Keywords

Navigation