Finding pictures of objects in large collections of images

  • David A. Forsyth
  • Jitendra Malik
  • Margaret M. Fleck
  • Hayit Greenspan
  • Thomas Leung
  • Serge Belongie
  • Chad Carson
  • Chris Bregler
3D Representations and Applications
Part of the Lecture Notes in Computer Science book series (LNCS, volume 1144)


Retrieving images from very large collections, using image content as a key, is becoming an important problem. Users prefer to ask for pictures using notions of content that are strongly oriented to the presence of abstractly defined objects. Computer programs that implement these queries automatically are desirable, but are hard to build because conventional object recognition techniques from computer vision cannot recognize very general objects in very general contexts.

This paper describes our approach to object recognition, which is structured around a sequence of increasingly specialized grouping activities that assemble coherent regions of image that can be shown to satisfy increasingly stringent constraints. The constraints that are satisfied provide a form of object classification in quite general contexts.

This view of recognition is distinguished by: far richer involvement of early visual primitives, including color and texture; hierarchical grouping and learning strategies in the classification process; the ability to deal with rather general objects in uncontrolled configurations and contexts. We illustrate these properties with four case-studies: one demonstrating the use of color and texture descriptors; one showing how trees can be described by fusing texture and geometric properties; one learning scenery concepts using grouped features; and one showing how this view of recognition yields a program that can tell, quite accurately, whether a picture contains naked people or not.


Computer Vision Image Database Color Histogram Image Annotation Texture Region 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Ashley, J., Barber, R., Flickner, M.D., Hafner, J.L., Lee, D., Niblack, W. and Petkovich, D. (1995) “Automatic and semiautomatic methods for image annotation and retrieval in QBIC,” SPIE Proc. Storage and Retrieval for Image and Video Databases III, 24–35.Google Scholar
  2. 2.
    Belongie, S., Blasi, R., and Murphy, K. (1996) “Grouping of Color and Texture Features for Automated Image Annotation,” Technical Report for CS280, University of California Berkeley.Google Scholar
  3. 3.
    Brady, J.M. and Asada, H. (1984) “Smoothed Local Symmetries and Their Implementation,” Int. J. Robotics Res. 3/3, 36–61.Google Scholar
  4. 4.
    Brooks, R.A. (1981) “Symbolic Reasoning among 3-D Models and 2-D Images,” Artificial Intelligence 17, pp. 285–348.CrossRefGoogle Scholar
  5. 5.
    Burel, G., and Carel, D. (1994) “Detecting and localization of face on digital images” Pattern Recognition Letters 15 pp 963–967.Google Scholar
  6. 6.
    Burt, P.J., and Adelson, E.H., (1983) “The Laplacian Pyramid as a Compact Image Code,” IEEE Trans. on Communications, vol. com-31, no. 4.Google Scholar
  7. 7.
    Canny, J.F. (1986) “A Computational Approach to Edge Detection,” IEEE Patt. Anal. Mach. Int. 8/6, pp. 679–698.Google Scholar
  8. 8.
    Connell, J.H., and Brady, J.M. (1987) “Generating and Generalizing Models of Visual Objects,” Artificial Intelligence, 31, 2, 159–183.Google Scholar
  9. 9.
    Fleck, Margaret M. (1996) “The Topology of Boundaries,” in press, Artificial Intelligence.Google Scholar
  10. 10.
    Fleck, M.M., Forsyth, D.A., and Bregler, C. (1996) “Finding Naked People,” Fourth European Conference on Computer Vision, Cambridge, UK, Vol 2, pp. 593–602.Google Scholar
  11. 11.
    Förstner, W. (1993) Chapter 16, in Haralick, R. and Shapiro, L. Computer and Robot Vision, Addison-Wesley.Google Scholar
  12. 12.
    Forsyth, D.A., Mundy, J.L., Zisserman, A.P., Heller, A., Coehlo, C., and Rothwell, C.A. (1991) “Invariant Descriptors for 3D Recognition and Pose,” IEEE Trans. Patt. Anal. and Mach. Intelligence, 13, 10.Google Scholar
  13. 13.
    Freeman, W., and Roth, M. (1995) Orientation histograms for hand gesture recognition. International Workshop on Automatic Face-and Gesture-Recognition.Google Scholar
  14. 14.
    Garding, J., and Lindeberg, T. (1996) Direct computation of shape cues using scale-adapted spatial derivative operators. Int. J. of Computer Vision, 17, February 1996.Google Scholar
  15. 15.
    Greenspan, H., Goodman R., Chellappa, R., and Anderson, S. (1994) “Learning Texture Discrimination Rules in a Multiresolution System,” in the special issue on “Learning in Computer Vision” of the IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI), Vol. 16, No. 9, 894–901.Google Scholar
  16. 16.
    Greenspan, H., Belongie, S., Perona, P., and Goodman, R. (1994) “Rotation Invariant Texture Recognition Using a Steerable Pyramid,” 12th International Conference on Pattern Recognition (ICPR), Jerusalem, Israel.Google Scholar
  17. 17.
    Grimson, W.E.L. and Lozano-Pérez, T. (1987) “Localising overlapping parts by searching the interpretation tree”, PAMI, 9, 469–482.Google Scholar
  18. 18.
    Huttenlocher, D.P. and Ullman, S. (1986) “Object recognition using alignment,” Proc. ICCV-1, 102–111.Google Scholar
  19. 19.
    Jacobs, C.E., Finkelstein, A., and Salesin, D.H. (1995) “Fast Multiresolution Image Querying,” Proc SIGGRAPH-95, 277–285.Google Scholar
  20. 20.
    Kelly, P.M., Cannon, M., Hush, D.R. (1995) “Query by image example: the comparison algorithm for navigating digital image databases (CANDID) approach,” SPIE Proc. Storage and Retrieval for Image and Video Databases III, 238–249.Google Scholar
  21. 21.
    Kriegman, D. and Ponce, J. (1994) “Representations for recognising complex curved 3D objects,” Proc. International NSF-ARPA workshop on object representation in computer vision, LNCS-994, 89–100.Google Scholar
  22. 22.
    Lamdan, Y., Schwartz, J.T. and Wolfson, H.J. (1988) “Object Recognition by Affine Invariant Matching,” Proceedings CVPR, p.335–344.Google Scholar
  23. 23.
    Layne, S.S. (1994) “Some issues in the indexing of images,” J. Am. Soc. Information Science, 45, 8, 583–588.Google Scholar
  24. 24.
    Leung, T.K., Burl, M.C., Perona, P. (1995) “Finding faces in cluttered scenes using random labelled graph matching, ” International Conference on Computer Vision pp 637–644.Google Scholar
  25. 25.
    Leung, T.K., and Malik, J., “Detecting, localizing and grouping repeated scene elements from an image,” (1996) Fourth European Conference on Computer Vision, Cambridge, UK, Vol 1, pp. 546–555.Google Scholar
  26. 26.
    Liu, J., Mundy, J.L., Forsyth, D.A., Zisserman, A.P., and Rothwell, C.A. (1993) “Efficient Recognition of rotationally symmetric surfaces and straight homogenous generalized cylinders,” IEEE Conference on Computer Vision and Pattern Recognition '93.Google Scholar
  27. 27.
    Lowe, David G. (1987) “The Viewpoint Consistency Constraint,” Intern. J. of Comp. Vis, 1/1, pp. 57–72.Google Scholar
  28. 28.
    Malik, J., and Perona, P. (1990) “Preattentive texture discrimination with early vision mechanisms,” J. Opt. Soc. Am. A, 7(5):923–932.PubMedGoogle Scholar
  29. 29.
    Malik, J., and Rosenholtz, R. (1994) “Recovering surface curvature and orientation from texture distortion: a least squares algorithm and sensitivity analysis,” Proc. of Third European Conf. on Computer Vision, Stockholm, published as J.O. Eklundh (ed.) LNCS 800, Springer Verlag, pp. 353–364.Google Scholar
  30. 30.
    Minka, T. (1995) “An image database browser that learns from user interaction,” MIT media lab TR 365.Google Scholar
  31. 31.
    Mukherjee, D.P., Zisserman, A., and Brady, J.M. (1995) “Shape from symmetry — detecting and exploiting symmetry in affine images,” Proc. Roy. Soc., 351, 77–106.Google Scholar
  32. 32.
    Murase, H. and Nayar, S.K. (1995) “Visual learning and recognition of 3D objects from appearance,” Int. J. Computer Vision, 14, 1, 5–24.Google Scholar
  33. 33.
    Nevatia, R. and Binford, T.O. (1977) “Description and recognition of curved objects,” Artificial Intelligence, 8, 77–98, 1977.Google Scholar
  34. 34.
    Niblack, W., Barber, R, Equitz, W., Flickner, M., Glasman, E., Petkovic, D., and Yanker, P. (1993) “The QBIC project: querying images by content using colour, texture and shape,” IS and T/SPIE 1993 Intern. Symp. Electr. Imaging: Science and Technology, Conference 1908, Storage and Retrieval for Image and Video Databases.Google Scholar
  35. 35.
    Ogle, Virginia E. and Michael Stonebraker (1995) “Chabot: Retrieval from a Relational Database of Images,” Computer 28/9, pp. 40–48.Google Scholar
  36. 36.
    Pentland A., Moghaddam, B., Starner T., (1994) “View-based and modular eigenspaces for face recognition,” Computer Vision and Pattern Recognition, pp 84–91.Google Scholar
  37. 37.
    Pentland, A., Picard, R.W., and Sclaroff, S. (1993) “Photobook: content-based manipulation of image databases,” MIT Media Lab Perceptual Computing TR No. 255.Google Scholar
  38. 38.
    Picard, R.W. and Minka, T. (1995) “Vision texture for annotation,” J. Multimedia systems, 3, 3–14.Google Scholar
  39. 39.
    Polana, R., Nelon, R. (1993) “Detecting Activities” Computer Vision and Pattern Recognition pp 2–13.Google Scholar
  40. 40.
    Price, R., Chua, T.-S., Al-Hawamdeh, S. (1992) “Applying relevance feedback to a photo-archival system,” J. Information Sci., 18, 203–215.Google Scholar
  41. 41.
    J. R. Quinlan, C4.5 Programs for Machine Learning, Morgan Kauffman, 1993.Google Scholar
  42. 42.
    Rothwell, C.A., Zisserman, A., Mundy, J.L., and Forsyth, D.A. (1992) “Efficient Model Library Access by Projectively Invariant Indexing Functions,” Computer Vision and Pattern Recognition 92, 109–114.Google Scholar
  43. 43.
    Rowley, H., Baluja, S., Kanade, T. (1996) “Human Face Detection in Visual Scenes” NIPS, volume 8, 1996.Google Scholar
  44. 44.
    Sclaroff, S. (1995) “World wide web image search engines,” Boston University Computer Science Dept TR95-016.Google Scholar
  45. 45.
    Sirovitch, L. and Kirby, M., “Low-dimensional procedure for the characterization of human faces,” J. Opt. Soc. America A, 2, 519–524, 1987.Google Scholar
  46. 46.
    Stein, F. and Medioni, G. (1992) “Structural indexing: efficient 3D object recognition,” PAMI-14, 125–145.Google Scholar
  47. 47.
    Sung, K.K, Poggio, T., (1994) “Example-based Learning from View-based Human Face Detection” MIT A.I. Lab Memo No. 1521.Google Scholar
  48. 48.
    Taubin, G. and Cooper, D.B. (1992) “Object recognition based on moment (or algebraic) invariants,” in J.L. Mundy and A.P. Zisserman (ed.s) Geometric Invariance in Computer Vision, MIT Press.Google Scholar
  49. 49.
    Taylor, B., (1977) “Tense and Continuity” Linguistics and Philosophy 1 199–220.Google Scholar
  50. 50.
    Tenny, C.L. (1987) “Grammaticalizing Aspect and Affectedness,” Ph.D. thesis, Linguistics and Philosophy, Massachusetts Inst. of Techn.Google Scholar
  51. 51.
    Turk, M. and Pentland, A., “Eigenfaces for recognition,” J. Cognitive Neuroscience, 3, 1, 1991.Google Scholar
  52. 52.
    Ullman, S. and Basri, R. (1991) “Recognition by linear combination of models,” IEEE PAMI, 13, 10, 992–1007.Google Scholar
  53. 53.
    Weiss, I. (1988) “Projective Invariants of Shapes,” Proceeding DARPA Image Understanding Workshop, p.1125–1134.Google Scholar
  54. 54.
    Whorf, B.L. (1941) “The Relation of Habitual Thought and Behavior to Language,” in Leslie Spier, ed., Language, culture, and personality, essays in memory of Edward Sapir, Sapir Memorial Publication Fund, Menasha, WI.Google Scholar
  55. 55.
    Zerroug, M. and Nevatia, R. (1994) “From an intensity image to 3D segmented descriptions,” Proc 12'th ICPR, 108–113.Google Scholar
  56. 56.
    Zisserman, A., Mundy, J.L., Forsyth, D.A., Liu, J.S., Pillow, N., Rothwell, C.A. and Utcke, S. (1995) “Class-based grouping in perspective images”, Intern. Conf. on Comp. Vis. Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 1996

Authors and Affiliations

  • David A. Forsyth
    • 1
  • Jitendra Malik
    • 1
  • Margaret M. Fleck
    • 2
  • Hayit Greenspan
    • 1
    • 3
  • Thomas Leung
    • 1
  • Serge Belongie
    • 1
  • Chad Carson
    • 1
  • Chris Bregler
    • 1
  1. 1.Computer Science DivisionUniversity of California at BerkeleyBerkeley
  2. 2.Dept. of Computer ScienceUniversity of IowaIowa City
  3. 3.Dept. of Electrical EngineeringCaltechPasadena

Personalised recommendations