Weakly Supervised Scale-Invariant Learning of Models for Visual Recognition

Fergus, R.; Perona, P.; Zisserman, A.

doi:10.1007/s11263-006-8707-x

Weakly Supervised Scale-Invariant Learning of Models for Visual Recognition

Published: 01 July 2006

Volume 71, pages 273–303, (2007)
Cite this article

International Journal of Computer Vision Aims and scope Submit manuscript

R. Fergus¹,
P. Perona² &
A. Zisserman¹

717 Accesses
156 Citations
3 Altmetric
Explore all metrics

Abstract

We investigate a method for learning object categories in a weakly supervised manner. Given a set of images known to contain the target category from a similar viewpoint, learning is translation and scale-invariant; does not require alignment or correspondence between the training images, and is robust to clutter and occlusion. Category models are probabilistic constellations of parts, and their parameters are estimated by maximizing the likelihood of the training data. The appearance of the parts, as well as their mutual position, relative scale and probability of detection are explicitly described in the model. Recognition takes place in two stages. First, a feature-finder identifies promising locations for the model”s parts. Second, the category model is used to compare the likelihood that the observed features are generated by the category model, or are generated by background clutter. The flexible nature of the model is demonstrated by results over six diverse object categories including geometrically constrained categories (e.g. faces, cars) and flexible objects (such as animals).

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

End-to-End Object Detection with Transformers

Microsoft COCO: Common Objects in Context

Attention mechanisms in computer vision: A survey

Article Open access 15 March 2022

References

Agarwal, S., Awan, A., and Roth, D. 2002. Uiuc car dataset. http://l2r.cs.uiuc.edu/~cogcomp/Data/Car/.
Agarwal, S. and Roth, D. 2002. Learning a sparse representation for object detection. In Proc. of the European Conference on Computer Vision, pp. 113–130.
Amit, Y. and Geman, D. 1999. A computational model for visual selection. Neural Computation, 11(7):1691–1715.
Article Google Scholar
Borenstein, E., and Ullman, S. 2002. Class-specific, top-down segmentation. In Proc. of the European Conference on Computer Vision, pp. 109–124.
Burl, M., Weber, M., and Perona, P. 1998. A probabilistic approach to object recognition using local photometry and global geometry. In Proc. of the European Conference on Computer Vision, pp. 628–641.
Crandall, D., Felzenszwalb, P., and Huttenlocher, D. 2005. Spatial priors for part-based recognition using statistical models. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, San Diego, Vol. 1, pp. 10–17.
Csurka, G., Bray, C., Dance, C., and Fan, L. 2004. Visual categorization with bags of keypoints. In Workshop on Statistical Learning in Computer Vision, ECCV, pp. 1–22.
Dempster, A., Laird, N., and Rubin, D. 1976. Maximum likelihood from incomplete data via the EM algorithm. JRSS B, 39:1– 38.
Google Scholar
Fei-Fei, L., Fergus, R. and Perona, P. 2003. A Bayesian approach to unsupervised one-shot learning of object categories. In Proc. of the 9th International Conference on Computer Vision, Nice, France, pp. 1134–1141.
Felzenszwalb, P., and Huttenlocher, D. 2000. Pictorial structures for object recognition. In Proc. of the IEEE Conference on Omputer Vision and Pattern Recogniion, pp. 2066–2073.
Fergus., R. 2005. Visual Object Category Recognition. PhD thesis, University of Oxford, UK.
Fergus, R., and Perona, P. 2003. Caltech object category datasets. http://www.vision.caltech.edu/html-files/archive.html.
Fergus, R., Perona, P., and Zisserman, A. 2004. A visual category filter for google images. In Proceedings of the 8th European Conference on Computer Vision, Prague, Czech Republic, Springer-Verlag, pp. 242–256.
Fergus, R., Perona, P., and Zisserman, A. 2005. A sparse object category model for efficient learning and exhaustive recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, San Diego, vol. 1, pp. 380–387.
Fergus, R., Perona, P., and Zisserman, P. 2003. Object class recognition by unsupervised scale-invariant learning. In Proc. CVPR.
Fergus, R., Weber, M. and Perona, P. 2001. Efficient methods for object recognition using the constellation model. Technical report, California Institute of Technology.
Forsyth, D.A. and Ponce, J. 2002. Computer Vision: A Modern Approach. Prentice Hall.
Grimson, W.E.L., and Lozano-Pérez, T. 1987. Localizing overlapping parts by searching the interpretation tree. IEEE Transactions on Pattern Analysis and Machine Intelligence, 9(4):469–482.
Article Google Scholar
Hart, P., Nilsson, N., and Raphael, B. 1968. A formal basis for the determination of minimum cost paths. IEEE Transactions on SSC, 4:100–107.
Google Scholar
Heisele, B., Serre, T., Pontil, M., Vetter, T., and Poggio, T. 2002. Categorization by learning and combining object parts. In Advances in Neural Information Processing Systems 14, Vancouver, Canada, vol. 2, pp. 1239–1245.
Jerrum, M. and Sinclair, A. 1997. The Markov chain Monte Carlo method. In D. S. Hochbaum, (ed.), Approximation Algorithms for NP-hard Problems. PWS Publishing, Boston.
Google Scholar
Jurie, F. and Schmid, C. 2004. Scale-invariant shape features for recognition of object categories. In Proc. of the IEEE Conference on Computer Vision and Pattern Recognition, Washington, DC, pp. 90–96.
Kadir, T. and Brady, M. 2001. Scale, saliency and image description. International Journal of Computer Vision, 45(2):83–105.
Article MATH Google Scholar
Ke, Y. and Sukthankar, R. 2004. PCA-SIFT: A more distinctive representation for local image descriptors. In Proc. of the IEEE Conference on Computer Vision and Pattern Recognition, Wasington, DC.
LeCun, Y., Huang, F. and Bottou, L. 2004. Learning methods for generic object recognition with invariance to pose and lighting. In Proc. of the IEEE Conference on Computer Vision and Pattern Recognition, Wasington, DC, IEEE Press.
Leibe, B., Leonardis, A. and Schiele, B. 2004. Combined object categorization and segmentation with an implicit shape model. In Workshop on Statistical Learning in Computer Vision, ECCV.
Leung, T., Burl, M. and Perona, P. 1998. Probabilistic affine invariants for recognition. In Proc. of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 678–684.
Lindeberg., T. 1998. Feature detection with automatic scale selection. International Journal of Computer Vision, 30(2):77–116.
Google Scholar
Lowe, D.G. 1985. Perceptual Organization and Visual Recognition. Kluwer Academic Publishers.
Mardia, K.V. and Dryden, I.L. 1989. Shape distributions for landmark data. Adv. Appl. Prob., 21:742–755.
Article MATH MathSciNet Google Scholar
Mikolajczyk, K. and Schmid, C. 2001. Indexing based on scale invariant interest points. In Proc. of the 8th International Conference on Computer Vision, Vancouver, Canada, pp. 525–531.
Opelt, A., Fussenegger, A., and Auer, P. 2004. Weak hypotheses and boosting for generic object detection and recognition. In Proc. of the 8th International Conference on Computer Vision, Prague, Czech Republic, 2004.
Rowley, H., Baluja, S., and Kanade, T. 1998. Neural network-based face detection. IEEE PAMI, 20(1):23–38.
Google Scholar
Schmid, C. 2001. Constructing models for content-based image retrieval. In Proc. of the IEEE Conference on Computer Vision and Pattern Recognition, vol. 2, pp. 39–45.
Schneiderman, H. and Kanade, T. 2000. A statistical approach to 3D object detection applied to faces and cars. In Proc. Computer Vision and Pattern Recognition, pp. 746–751.
Sivic, J., Russell, B., Efros, A., Zisserman, A. and Freeman, W. 2005. Discovering object categories in image collections. Technical Report A. I. Memo 2005-005, Massachusetts Institute of Technology.
Sung, K. and Poggio, T. 1998. Example-based learning for view-based human face detection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 20(1):39–51.
Article Google Scholar
Thureson, J. and Carlsson, S. 2004. Appearance based qualitative image description for object class recognition. In Proc. of the 8th European Conference on Computer Vision, Prague, Czech Republic, pp. 518–529.
Torralba, A., Murphy, K.P., and Freeman, W.T. 2004. Sharing features: efficient boosting procedures for multiclass object detection. In Proc. of the 8th European Conference on Computer Vision, Prague, Czech Republic, pp. 762–769.
Viola, P. and Jones, M. 2001. Rapid object detection using a boosted cascade of simple features. In Proc. of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 511–518.
Weber, M. 2000. Unsupervised learning of models for object recognition. PhD thesis, California Institute of Technology, Pasadena, CA.
Weber, M. Einhauser, W. Welling, M., and Perona, P. 2000. Viewpoint-invariant learning and detection of human heads. In Proc. 4th IEEE Int. Conf. Autom. Face and Gesture Recog., FG2000, pp. 20–27.
Weber, M., Welling, M. and Perona, P. 2000. Towards automatic discovery of object categories. In Proc. of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 101– 109.
Weber, M., Welling, M., and Perona, P. 2000. Unsupervised learning of models for recognition. In Proc. of the European Conference on Computer Vision, pp. 18–32.

Download references

Author information

Authors and Affiliations

Department of Engineering Science, University of Oxford, Parks Road, Oxford, OX1 3PJ, U.K.
R. Fergus & A. Zisserman
Department of Electrical Engineering, California Institute of Technology, MC 136-93, Pasadena, CA, 91125, U.S.A.
P. Perona

Authors

R. Fergus
View author publications
You can also search for this author in PubMed Google Scholar
P. Perona
View author publications
You can also search for this author in PubMed Google Scholar
A. Zisserman
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to R. Fergus.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Fergus, R., Perona, P. & Zisserman, A. Weakly Supervised Scale-Invariant Learning of Models for Visual Recognition. Int J Comput Vision 71, 273–303 (2007). https://doi.org/10.1007/s11263-006-8707-x

Download citation

Published: 01 July 2006
Issue Date: March 2007
DOI: https://doi.org/10.1007/s11263-006-8707-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Weakly Supervised Scale-Invariant Learning of Models for Visual Recognition

Abstract

Access this article

Similar content being viewed by others

End-to-End Object Detection with Transformers

Microsoft COCO: Common Objects in Context

Attention mechanisms in computer vision: A survey

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Weakly Supervised Scale-Invariant Learning of Models for Visual Recognition

Abstract

Access this article

Similar content being viewed by others

End-to-End Object Detection with Transformers

Microsoft COCO: Common Objects in Context

Attention mechanisms in computer vision: A survey

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation