Skip to main content
Log in

Weakly Supervised Learning of Visual Models and Its Application to Content-Based Retrieval

  • Published:
International Journal of Computer Vision Aims and scope Submit manuscript

Abstract

This paper presents a method for weakly supervised learning of visual models. The visual model is based on a two-layer image description: a set of “generic” descriptors and their distribution over neighbourhoods. “Generic” descriptors represent sets of similar rotational invariant feature vectors. Statistical spatial constraints describe the neighborhood structure and make our description more discriminant. The joint probability of the frequencies of “generic” descriptors over a neighbourhood is multi-modal and is represented by a set of “neighbourhood-frequency” clusters. Our image description is rotationally invariant, robust to model deformations and characterizes efficiently “appearance-based” visual structure. The selection of distinctive clusters determines model features (common to the positive and rare in the negative examples). Visual models are retrieved and localized using a probabilistic score. Experimental results for “textured” animals and faces show a very good performance for retrieval as well as localization.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Amit, Y. and Geman, D. 1999. A computational model for visual selection. Neural Computation, 11(7):1691–1715.

    Google Scholar 

  • Belongie, S., Carson, C., Greenspan, H., and Malik, J. 1998. Colorand texture-based image segmentation using EM and its application to content-based image retrieval. In Proceedings of the 6th International Conference on Computer Vision, Bombay, India, pp. 675–682.

  • Bishop, C.M. 1995. Neural Networks for Pattern Recognition. Oxford University Press.

  • Cozzi, A., Crespi, B., Valentinotti, F., and Worgotter, F. 1997. Performance of phase-based algorithms for disparity estimation. Machine Vision and Applications, 9(5/6):334–340.

    Google Scholar 

  • Duda, R. and Hart, P. 1973. Pattern Classification and Scene Analysis. Wiley-Interscience.

  • Forsyth, D.A. and Fleck, M.M. 1997. Body plans. In Proceedings of the Conference on Computer Vision and Pattern Recognition, Puerto Rico, USA, pp. 678–683.

  • Gabor, D. 1946. Theory of communication. Journal I.E.E., 3(93):429–457.

    Google Scholar 

  • Jain, A.K. and Farrokhnia, F. 1991. Unsupervised texture segmentation using Gabor filters. Pattern Recognition, 24(12):1167–1186.

    Google Scholar 

  • Koenderink, J.J. and van Doorn, A.J. 1987. Representation of local geometry in the visual system. Biological Cybernetics, 55:367–375.

    Google Scholar 

  • Konishi, S. and Yuille, A.L. 2000. Statistical cues for domain specific image segmentation with performance analysis. In Proceedings of the Conference on Computer Vision and Pattern Recognition, Hilton Head Island, South Carolina, USA, pp. 125–132.

  • Lai, C., Tax, D., Duin, R., Pekalska, E., and Paclik, P. 2002. Oneclass classifiers for image database retrieval. In Multiple Classifier Systems, pp. 212–221.

  • Lazebnik, S., Schmid, C., and Ponce, J. 2003. Sparse texture representation using affine-invariant neighborhoods. In Proceedings of the Conference on Computer Vision and Pattern Recognition, Madison, Wisconsin, USA, vol. II, pp. 313–324.

    Google Scholar 

  • Lindeberg, T. 1998. Feature detection with automatic scale selection. International Journal of Computer Vision, 30(2):79–116.

    Google Scholar 

  • Malik, J., Belongie, S., Shi, J., and Leung, T. 1999. Textons, contours and regions: Cue integration in image segmentation. In Proceedings of the 7th International Conference on Computer Vision, Kerkyra, Greece, pp. 918–925.

  • Mikolajczyk, K. and Schmid, C. 2001. Indexing based on scale invariant interest points. In Proceedings of the 8th International Conference on Computer Vision, Vancouver, Canada, pp. 525–531.

  • Niblack, W., Barber, R., Equitz, W., Fickner, M., Glasman, E., Petkovic, D., and Yanker, P. 1993. The QBIC project: Querying images by content using color, texture and shape. In Proceedings of the SPIE Conference on Geometric Methods in Computer Vision II, San Diego, California, USA.

  • Papageorgiou, C. and Poggio, T. 2000. A trainable system for object detection. International Journal of Computer Vision, 38(1):15–33.

    Google Scholar 

  • Paragios, N. and Deriche, R. 1999. Geodesic active regions for supervised texture segmentation. In Proceedings of the 7th International Conference on Computer Vision, Kerkyra, Greece, pp. 926–932.

  • Ratan, A.L., Maron, O., Grimson, W.E.L., and Lozano-Pérez, T. 1999. A framework for learning query concepts in image classification. In Proceedings of the Conference on Computer Vision and Pattern Recognition, Fort Collins, Colorado, USA, pp. 423–429.

  • Rikert, T.D., Jones, M.J., and Viola, P. 1999. A cluster-based statistical model for object detection. In Proceedings of the 7th International Conference on Computer Vision, Kerkyra, Greece, pp. 1046–1053.

  • Rubner, Y. and Tomasi, C. 1999. Texture-based image retrieval without segmentation. In Proceedings of the 7th International Conference on Computer Vision, Kerkyra, Greece, vol. 2, pp. 1018–1024.

    Google Scholar 

  • Schmid, C. 2001. Constructing models for content-based image retrieval. In Proceedings of the Conference on Computer Vision and Pattern Recognition, Kauai, Hawaii, USA, vol. II, pp. 39–45.

    Google Scholar 

  • Schneiderman, H. and Kanade, T. 2000. A statistical method for 3D object detection applied to faces and cars. In Proceedings of the Conference on Computer Vision and Pattern Recognition, Hilton Head Island, South Carolina, USA, vol. I, pp. 746–751.

    Google Scholar 

  • Sung, K.K. and Poggio, T. 1998. Example-based learning for viewbased human face detection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 20(1):39–51.

    Google Scholar 

  • Vapnik, V.1995. The Nature of Statistical Learning Theory. Springer-Verlag.

  • Varma, M. and Zisserman, A. 2002. Classifying images of materials: Achieving viewpoint and illumination indepence. In Proceedings of the 7th European Conference on Computer Vision, Copenhagen, Denmark, vol. III, pp. 255–271.

    Google Scholar 

  • Vogelhuber, V. and Schmid, C. 2000. Face detection based on generic local descriptors and spatial constraints. In Proceedings of the 15th International Conference on Pattern Recognition, Barcelona, Spain, vol. 1, pp. 1084–1087.

    Google Scholar 

  • Weber, M., Welling, M., and Perona, P. 2000. Unsupervised learning of models for recognition. In Proceedings of the 6th European Conference on Computer Vision, Dublin, Ireland, pp. 18–32.

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Schmid, C. Weakly Supervised Learning of Visual Models and Its Application to Content-Based Retrieval. International Journal of Computer Vision 56, 7–16 (2004). https://doi.org/10.1023/B:VISI.0000004829.38247.b0

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1023/B:VISI.0000004829.38247.b0

Navigation