Weakly Supervised Learning of Visual Models and Its Application to Content-Based Retrieval

  • Cordelia Schmid
Article

Abstract

This paper presents a method for weakly supervised learning of visual models. The visual model is based on a two-layer image description: a set of “generic” descriptors and their distribution over neighbourhoods. “Generic” descriptors represent sets of similar rotational invariant feature vectors. Statistical spatial constraints describe the neighborhood structure and make our description more discriminant. The joint probability of the frequencies of “generic” descriptors over a neighbourhood is multi-modal and is represented by a set of “neighbourhood-frequency” clusters. Our image description is rotationally invariant, robust to model deformations and characterizes efficiently “appearance-based” visual structure. The selection of distinctive clusters determines model features (common to the positive and rare in the negative examples). Visual models are retrieved and localized using a probabilistic score. Experimental results for “textured” animals and faces show a very good performance for retrieval as well as localization.

visual model two-layer image description weakly supervised learning 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Amit, Y. and Geman, D. 1999. A computational model for visual selection. Neural Computation, 11(7):1691–1715.Google Scholar
  2. Belongie, S., Carson, C., Greenspan, H., and Malik, J. 1998. Colorand texture-based image segmentation using EM and its application to content-based image retrieval. In Proceedings of the 6th International Conference on Computer Vision, Bombay, India, pp. 675–682.Google Scholar
  3. Bishop, C.M. 1995. Neural Networks for Pattern Recognition. Oxford University Press.Google Scholar
  4. Cozzi, A., Crespi, B., Valentinotti, F., and Worgotter, F. 1997. Performance of phase-based algorithms for disparity estimation. Machine Vision and Applications, 9(5/6):334–340.Google Scholar
  5. Duda, R. and Hart, P. 1973. Pattern Classification and Scene Analysis. Wiley-Interscience.Google Scholar
  6. Forsyth, D.A. and Fleck, M.M. 1997. Body plans. In Proceedings of the Conference on Computer Vision and Pattern Recognition, Puerto Rico, USA, pp. 678–683.Google Scholar
  7. Gabor, D. 1946. Theory of communication. Journal I.E.E., 3(93):429–457.Google Scholar
  8. Jain, A.K. and Farrokhnia, F. 1991. Unsupervised texture segmentation using Gabor filters. Pattern Recognition, 24(12):1167–1186.Google Scholar
  9. Koenderink, J.J. and van Doorn, A.J. 1987. Representation of local geometry in the visual system. Biological Cybernetics, 55:367–375.Google Scholar
  10. Konishi, S. and Yuille, A.L. 2000. Statistical cues for domain specific image segmentation with performance analysis. In Proceedings of the Conference on Computer Vision and Pattern Recognition, Hilton Head Island, South Carolina, USA, pp. 125–132.Google Scholar
  11. Lai, C., Tax, D., Duin, R., Pekalska, E., and Paclik, P. 2002. Oneclass classifiers for image database retrieval. In Multiple Classifier Systems, pp. 212–221.Google Scholar
  12. Lazebnik, S., Schmid, C., and Ponce, J. 2003. Sparse texture representation using affine-invariant neighborhoods. In Proceedings of the Conference on Computer Vision and Pattern Recognition, Madison, Wisconsin, USA, vol. II, pp. 313–324.Google Scholar
  13. Lindeberg, T. 1998. Feature detection with automatic scale selection. International Journal of Computer Vision, 30(2):79–116.Google Scholar
  14. Malik, J., Belongie, S., Shi, J., and Leung, T. 1999. Textons, contours and regions: Cue integration in image segmentation. In Proceedings of the 7th International Conference on Computer Vision, Kerkyra, Greece, pp. 918–925.Google Scholar
  15. Mikolajczyk, K. and Schmid, C. 2001. Indexing based on scale invariant interest points. In Proceedings of the 8th International Conference on Computer Vision, Vancouver, Canada, pp. 525–531.Google Scholar
  16. Niblack, W., Barber, R., Equitz, W., Fickner, M., Glasman, E., Petkovic, D., and Yanker, P. 1993. The QBIC project: Querying images by content using color, texture and shape. In Proceedings of the SPIE Conference on Geometric Methods in Computer Vision II, San Diego, California, USA.Google Scholar
  17. Papageorgiou, C. and Poggio, T. 2000. A trainable system for object detection. International Journal of Computer Vision, 38(1):15–33.Google Scholar
  18. Paragios, N. and Deriche, R. 1999. Geodesic active regions for supervised texture segmentation. In Proceedings of the 7th International Conference on Computer Vision, Kerkyra, Greece, pp. 926–932.Google Scholar
  19. Ratan, A.L., Maron, O., Grimson, W.E.L., and Lozano-Pérez, T. 1999. A framework for learning query concepts in image classification. In Proceedings of the Conference on Computer Vision and Pattern Recognition, Fort Collins, Colorado, USA, pp. 423–429.Google Scholar
  20. Rikert, T.D., Jones, M.J., and Viola, P. 1999. A cluster-based statistical model for object detection. In Proceedings of the 7th International Conference on Computer Vision, Kerkyra, Greece, pp. 1046–1053.Google Scholar
  21. Rubner, Y. and Tomasi, C. 1999. Texture-based image retrieval without segmentation. In Proceedings of the 7th International Conference on Computer Vision, Kerkyra, Greece, vol. 2, pp. 1018–1024.Google Scholar
  22. Schmid, C. 2001. Constructing models for content-based image retrieval. In Proceedings of the Conference on Computer Vision and Pattern Recognition, Kauai, Hawaii, USA, vol. II, pp. 39–45.Google Scholar
  23. Schneiderman, H. and Kanade, T. 2000. A statistical method for 3D object detection applied to faces and cars. In Proceedings of the Conference on Computer Vision and Pattern Recognition, Hilton Head Island, South Carolina, USA, vol. I, pp. 746–751.Google Scholar
  24. Sung, K.K. and Poggio, T. 1998. Example-based learning for viewbased human face detection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 20(1):39–51.Google Scholar
  25. Vapnik, V.1995. The Nature of Statistical Learning Theory. Springer-Verlag.Google Scholar
  26. Varma, M. and Zisserman, A. 2002. Classifying images of materials: Achieving viewpoint and illumination indepence. In Proceedings of the 7th European Conference on Computer Vision, Copenhagen, Denmark, vol. III, pp. 255–271.Google Scholar
  27. Vogelhuber, V. and Schmid, C. 2000. Face detection based on generic local descriptors and spatial constraints. In Proceedings of the 15th International Conference on Pattern Recognition, Barcelona, Spain, vol. 1, pp. 1084–1087.Google Scholar
  28. Weber, M., Welling, M., and Perona, P. 2000. Unsupervised learning of models for recognition. In Proceedings of the 6th European Conference on Computer Vision, Dublin, Ireland, pp. 18–32.Google Scholar

Copyright information

© Kluwer Academic Publishers 2004

Authors and Affiliations

  • Cordelia Schmid
    • 1
  1. 1.INRIA Rhône-AlpesMontbonnotFrance

Personalised recommendations