Abstract
This paper presents a method for weakly supervised learning of visual models. The visual model is based on a two-layer image description: a set of “generic” descriptors and their distribution over neighbourhoods. “Generic” descriptors represent sets of similar rotational invariant feature vectors. Statistical spatial constraints describe the neighborhood structure and make our description more discriminant. The joint probability of the frequencies of “generic” descriptors over a neighbourhood is multi-modal and is represented by a set of “neighbourhood-frequency” clusters. Our image description is rotationally invariant, robust to model deformations and characterizes efficiently “appearance-based” visual structure. The selection of distinctive clusters determines model features (common to the positive and rare in the negative examples). Visual models are retrieved and localized using a probabilistic score. Experimental results for “textured” animals and faces show a very good performance for retrieval as well as localization.
Similar content being viewed by others
References
Amit, Y. and Geman, D. 1999. A computational model for visual selection. Neural Computation, 11(7):1691–1715.
Belongie, S., Carson, C., Greenspan, H., and Malik, J. 1998. Colorand texture-based image segmentation using EM and its application to content-based image retrieval. In Proceedings of the 6th International Conference on Computer Vision, Bombay, India, pp. 675–682.
Bishop, C.M. 1995. Neural Networks for Pattern Recognition. Oxford University Press.
Cozzi, A., Crespi, B., Valentinotti, F., and Worgotter, F. 1997. Performance of phase-based algorithms for disparity estimation. Machine Vision and Applications, 9(5/6):334–340.
Duda, R. and Hart, P. 1973. Pattern Classification and Scene Analysis. Wiley-Interscience.
Forsyth, D.A. and Fleck, M.M. 1997. Body plans. In Proceedings of the Conference on Computer Vision and Pattern Recognition, Puerto Rico, USA, pp. 678–683.
Gabor, D. 1946. Theory of communication. Journal I.E.E., 3(93):429–457.
Jain, A.K. and Farrokhnia, F. 1991. Unsupervised texture segmentation using Gabor filters. Pattern Recognition, 24(12):1167–1186.
Koenderink, J.J. and van Doorn, A.J. 1987. Representation of local geometry in the visual system. Biological Cybernetics, 55:367–375.
Konishi, S. and Yuille, A.L. 2000. Statistical cues for domain specific image segmentation with performance analysis. In Proceedings of the Conference on Computer Vision and Pattern Recognition, Hilton Head Island, South Carolina, USA, pp. 125–132.
Lai, C., Tax, D., Duin, R., Pekalska, E., and Paclik, P. 2002. Oneclass classifiers for image database retrieval. In Multiple Classifier Systems, pp. 212–221.
Lazebnik, S., Schmid, C., and Ponce, J. 2003. Sparse texture representation using affine-invariant neighborhoods. In Proceedings of the Conference on Computer Vision and Pattern Recognition, Madison, Wisconsin, USA, vol. II, pp. 313–324.
Lindeberg, T. 1998. Feature detection with automatic scale selection. International Journal of Computer Vision, 30(2):79–116.
Malik, J., Belongie, S., Shi, J., and Leung, T. 1999. Textons, contours and regions: Cue integration in image segmentation. In Proceedings of the 7th International Conference on Computer Vision, Kerkyra, Greece, pp. 918–925.
Mikolajczyk, K. and Schmid, C. 2001. Indexing based on scale invariant interest points. In Proceedings of the 8th International Conference on Computer Vision, Vancouver, Canada, pp. 525–531.
Niblack, W., Barber, R., Equitz, W., Fickner, M., Glasman, E., Petkovic, D., and Yanker, P. 1993. The QBIC project: Querying images by content using color, texture and shape. In Proceedings of the SPIE Conference on Geometric Methods in Computer Vision II, San Diego, California, USA.
Papageorgiou, C. and Poggio, T. 2000. A trainable system for object detection. International Journal of Computer Vision, 38(1):15–33.
Paragios, N. and Deriche, R. 1999. Geodesic active regions for supervised texture segmentation. In Proceedings of the 7th International Conference on Computer Vision, Kerkyra, Greece, pp. 926–932.
Ratan, A.L., Maron, O., Grimson, W.E.L., and Lozano-Pérez, T. 1999. A framework for learning query concepts in image classification. In Proceedings of the Conference on Computer Vision and Pattern Recognition, Fort Collins, Colorado, USA, pp. 423–429.
Rikert, T.D., Jones, M.J., and Viola, P. 1999. A cluster-based statistical model for object detection. In Proceedings of the 7th International Conference on Computer Vision, Kerkyra, Greece, pp. 1046–1053.
Rubner, Y. and Tomasi, C. 1999. Texture-based image retrieval without segmentation. In Proceedings of the 7th International Conference on Computer Vision, Kerkyra, Greece, vol. 2, pp. 1018–1024.
Schmid, C. 2001. Constructing models for content-based image retrieval. In Proceedings of the Conference on Computer Vision and Pattern Recognition, Kauai, Hawaii, USA, vol. II, pp. 39–45.
Schneiderman, H. and Kanade, T. 2000. A statistical method for 3D object detection applied to faces and cars. In Proceedings of the Conference on Computer Vision and Pattern Recognition, Hilton Head Island, South Carolina, USA, vol. I, pp. 746–751.
Sung, K.K. and Poggio, T. 1998. Example-based learning for viewbased human face detection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 20(1):39–51.
Vapnik, V.1995. The Nature of Statistical Learning Theory. Springer-Verlag.
Varma, M. and Zisserman, A. 2002. Classifying images of materials: Achieving viewpoint and illumination indepence. In Proceedings of the 7th European Conference on Computer Vision, Copenhagen, Denmark, vol. III, pp. 255–271.
Vogelhuber, V. and Schmid, C. 2000. Face detection based on generic local descriptors and spatial constraints. In Proceedings of the 15th International Conference on Pattern Recognition, Barcelona, Spain, vol. 1, pp. 1084–1087.
Weber, M., Welling, M., and Perona, P. 2000. Unsupervised learning of models for recognition. In Proceedings of the 6th European Conference on Computer Vision, Dublin, Ireland, pp. 18–32.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Schmid, C. Weakly Supervised Learning of Visual Models and Its Application to Content-Based Retrieval. International Journal of Computer Vision 56, 7–16 (2004). https://doi.org/10.1023/B:VISI.0000004829.38247.b0
Issue Date:
DOI: https://doi.org/10.1023/B:VISI.0000004829.38247.b0