Abstract
We investigate a method for learning object categories in a weakly supervised manner. Given a set of images known to contain the target category from a similar viewpoint, learning is translation and scale-invariant; does not require alignment or correspondence between the training images, and is robust to clutter and occlusion. Category models are probabilistic constellations of parts, and their parameters are estimated by maximizing the likelihood of the training data. The appearance of the parts, as well as their mutual position, relative scale and probability of detection are explicitly described in the model. Recognition takes place in two stages. First, a feature-finder identifies promising locations for the model”s parts. Second, the category model is used to compare the likelihood that the observed features are generated by the category model, or are generated by background clutter. The flexible nature of the model is demonstrated by results over six diverse object categories including geometrically constrained categories (e.g. faces, cars) and flexible objects (such as animals).
Similar content being viewed by others
References
Agarwal, S., Awan, A., and Roth, D. 2002. Uiuc car dataset. http://l2r.cs.uiuc.edu/~cogcomp/Data/Car/.
Agarwal, S. and Roth, D. 2002. Learning a sparse representation for object detection. In Proc. of the European Conference on Computer Vision, pp. 113–130.
Amit, Y. and Geman, D. 1999. A computational model for visual selection. Neural Computation, 11(7):1691–1715.
Borenstein, E., and Ullman, S. 2002. Class-specific, top-down segmentation. In Proc. of the European Conference on Computer Vision, pp. 109–124.
Burl, M., Weber, M., and Perona, P. 1998. A probabilistic approach to object recognition using local photometry and global geometry. In Proc. of the European Conference on Computer Vision, pp. 628–641.
Crandall, D., Felzenszwalb, P., and Huttenlocher, D. 2005. Spatial priors for part-based recognition using statistical models. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, San Diego, Vol. 1, pp. 10–17.
Csurka, G., Bray, C., Dance, C., and Fan, L. 2004. Visual categorization with bags of keypoints. In Workshop on Statistical Learning in Computer Vision, ECCV, pp. 1–22.
Dempster, A., Laird, N., and Rubin, D. 1976. Maximum likelihood from incomplete data via the EM algorithm. JRSS B, 39:1– 38.
Fei-Fei, L., Fergus, R. and Perona, P. 2003. A Bayesian approach to unsupervised one-shot learning of object categories. In Proc. of the 9th International Conference on Computer Vision, Nice, France, pp. 1134–1141.
Felzenszwalb, P., and Huttenlocher, D. 2000. Pictorial structures for object recognition. In Proc. of the IEEE Conference on Omputer Vision and Pattern Recogniion, pp. 2066–2073.
Fergus., R. 2005. Visual Object Category Recognition. PhD thesis, University of Oxford, UK.
Fergus, R., and Perona, P. 2003. Caltech object category datasets. http://www.vision.caltech.edu/html-files/archive.html.
Fergus, R., Perona, P., and Zisserman, A. 2004. A visual category filter for google images. In Proceedings of the 8th European Conference on Computer Vision, Prague, Czech Republic, Springer-Verlag, pp. 242–256.
Fergus, R., Perona, P., and Zisserman, A. 2005. A sparse object category model for efficient learning and exhaustive recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, San Diego, vol. 1, pp. 380–387.
Fergus, R., Perona, P., and Zisserman, P. 2003. Object class recognition by unsupervised scale-invariant learning. In Proc. CVPR.
Fergus, R., Weber, M. and Perona, P. 2001. Efficient methods for object recognition using the constellation model. Technical report, California Institute of Technology.
Forsyth, D.A. and Ponce, J. 2002. Computer Vision: A Modern Approach. Prentice Hall.
Grimson, W.E.L., and Lozano-Pérez, T. 1987. Localizing overlapping parts by searching the interpretation tree. IEEE Transactions on Pattern Analysis and Machine Intelligence, 9(4):469–482.
Hart, P., Nilsson, N., and Raphael, B. 1968. A formal basis for the determination of minimum cost paths. IEEE Transactions on SSC, 4:100–107.
Heisele, B., Serre, T., Pontil, M., Vetter, T., and Poggio, T. 2002. Categorization by learning and combining object parts. In Advances in Neural Information Processing Systems 14, Vancouver, Canada, vol. 2, pp. 1239–1245.
Jerrum, M. and Sinclair, A. 1997. The Markov chain Monte Carlo method. In D. S. Hochbaum, (ed.), Approximation Algorithms for NP-hard Problems. PWS Publishing, Boston.
Jurie, F. and Schmid, C. 2004. Scale-invariant shape features for recognition of object categories. In Proc. of the IEEE Conference on Computer Vision and Pattern Recognition, Washington, DC, pp. 90–96.
Kadir, T. and Brady, M. 2001. Scale, saliency and image description. International Journal of Computer Vision, 45(2):83–105.
Ke, Y. and Sukthankar, R. 2004. PCA-SIFT: A more distinctive representation for local image descriptors. In Proc. of the IEEE Conference on Computer Vision and Pattern Recognition, Wasington, DC.
LeCun, Y., Huang, F. and Bottou, L. 2004. Learning methods for generic object recognition with invariance to pose and lighting. In Proc. of the IEEE Conference on Computer Vision and Pattern Recognition, Wasington, DC, IEEE Press.
Leibe, B., Leonardis, A. and Schiele, B. 2004. Combined object categorization and segmentation with an implicit shape model. In Workshop on Statistical Learning in Computer Vision, ECCV.
Leung, T., Burl, M. and Perona, P. 1998. Probabilistic affine invariants for recognition. In Proc. of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 678–684.
Lindeberg., T. 1998. Feature detection with automatic scale selection. International Journal of Computer Vision, 30(2):77–116.
Lowe, D.G. 1985. Perceptual Organization and Visual Recognition. Kluwer Academic Publishers.
Mardia, K.V. and Dryden, I.L. 1989. Shape distributions for landmark data. Adv. Appl. Prob., 21:742–755.
Mikolajczyk, K. and Schmid, C. 2001. Indexing based on scale invariant interest points. In Proc. of the 8th International Conference on Computer Vision, Vancouver, Canada, pp. 525–531.
Opelt, A., Fussenegger, A., and Auer, P. 2004. Weak hypotheses and boosting for generic object detection and recognition. In Proc. of the 8th International Conference on Computer Vision, Prague, Czech Republic, 2004.
Rowley, H., Baluja, S., and Kanade, T. 1998. Neural network-based face detection. IEEE PAMI, 20(1):23–38.
Schmid, C. 2001. Constructing models for content-based image retrieval. In Proc. of the IEEE Conference on Computer Vision and Pattern Recognition, vol. 2, pp. 39–45.
Schneiderman, H. and Kanade, T. 2000. A statistical approach to 3D object detection applied to faces and cars. In Proc. Computer Vision and Pattern Recognition, pp. 746–751.
Sivic, J., Russell, B., Efros, A., Zisserman, A. and Freeman, W. 2005. Discovering object categories in image collections. Technical Report A. I. Memo 2005-005, Massachusetts Institute of Technology.
Sung, K. and Poggio, T. 1998. Example-based learning for view-based human face detection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 20(1):39–51.
Thureson, J. and Carlsson, S. 2004. Appearance based qualitative image description for object class recognition. In Proc. of the 8th European Conference on Computer Vision, Prague, Czech Republic, pp. 518–529.
Torralba, A., Murphy, K.P., and Freeman, W.T. 2004. Sharing features: efficient boosting procedures for multiclass object detection. In Proc. of the 8th European Conference on Computer Vision, Prague, Czech Republic, pp. 762–769.
Viola, P. and Jones, M. 2001. Rapid object detection using a boosted cascade of simple features. In Proc. of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 511–518.
Weber, M. 2000. Unsupervised learning of models for object recognition. PhD thesis, California Institute of Technology, Pasadena, CA.
Weber, M. Einhauser, W. Welling, M., and Perona, P. 2000. Viewpoint-invariant learning and detection of human heads. In Proc. 4th IEEE Int. Conf. Autom. Face and Gesture Recog., FG2000, pp. 20–27.
Weber, M., Welling, M. and Perona, P. 2000. Towards automatic discovery of object categories. In Proc. of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 101– 109.
Weber, M., Welling, M., and Perona, P. 2000. Unsupervised learning of models for recognition. In Proc. of the European Conference on Computer Vision, pp. 18–32.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Fergus, R., Perona, P. & Zisserman, A. Weakly Supervised Scale-Invariant Learning of Models for Visual Recognition. Int J Comput Vision 71, 273–303 (2007). https://doi.org/10.1007/s11263-006-8707-x
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11263-006-8707-x