3D Object Modeling and Recognition from Photographs and Image Sequences

  • Fred Rothganger
  • Svetlana Lazebnik
  • Cordelia Schmid
  • Jean Ponce
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4170)


This chapter proposes a representation of rigid three-dimensional (3D) objects in terms of local affine-invariant descriptors of their images and the spatial relationships between the corresponding surface patches. Geometric constraints associated with different views of the same patches under affine projection are combined with a normalized representation of their appearance to guide the matching process involved in object modeling and recognition tasks. The proposed approach is applied in two domains: (1) Photographs — models of rigid objects are constructed from small sets of images and recognized in highly cluttered shots taken from arbitrary viewpoints. (2) Video — dynamic scenes containing multiple moving objects are segmented into rigid components, and the resulting 3D models are directly matched to each other, giving a novel approach to video indexing and retrieval.


Computer Vision Surface Patch Video Shot Sift Descriptor Reprojection Error 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Aner, A., Kender, J.R.: Video summaries through mosaic-based shot and scene clustering. In: Heyden, A., Sparr, G., Nielsen, M., Johansen, P. (eds.) ECCV 2002. LNCS, vol. 2350, pp. 388–402. Springer, Heidelberg (2002)Google Scholar
  2. 2.
    Ayache, N., Faugeras, O.D.: Hyper: a new approach for the recognition and positioning of two-dimensional objects. IEEE Transactions on Pattern Analysis and Machine Intelligence 8(1), 44–54 (1986)CrossRefGoogle Scholar
  3. 3.
    Baumberg, A.: Reliable feature matching across widely separated views. In: Conference on Computer Vision and Pattern Recognition, pp. 774–781 (2000)Google Scholar
  4. 4.
    Burns, J.B., Weiss, R.S., Riseman, E.M.: View variation of point-set and line-segment features. IEEE Transactions on Pattern Analysis and Machine Intelligence 15(1), 51–68 (1993)CrossRefGoogle Scholar
  5. 5.
    Faugeras, O.D., Luong, Q.T., Papadopoulo, T.: The Geometry of Multiple Images. MIT Press, Cambridge (2001)MATHGoogle Scholar
  6. 6.
    Faugeras, O.D., Hebert, M.: The representation, recognition, and locating of 3D objects. International Journal of Robotics Research 5(3), 27–52 (1986)CrossRefGoogle Scholar
  7. 7.
    Ferrari, V., Tuytelaars, T., Van Gool, L.: Simultaneous object recognition and segmentation by image exploration. In: European Conference on Computer Vision (2004)Google Scholar
  8. 8.
    Fischler, M.A., Bolles, R.C.: Random sample consensus: a paradigm for model fitting with application to image analysis and automated cartography. Communications ACM 24(6), 381–395 (1981)CrossRefMathSciNetGoogle Scholar
  9. 9.
    Fitzgibbon, A.W., Zisserman, A.: Multibody structure and motion: 3D reconstruction of independently moving objects. In: Vernon, D. (ed.) ECCV 2000. LNCS, vol. 1842, pp. 891–906. Springer, Heidelberg (2000)CrossRefGoogle Scholar
  10. 10.
    Gårding, J., Lindeberg, T.: Direct computation of shape cues using scale-adapted spatial derivative operators. International Journal of Computer Vision 17(2), 163–191 (1996)CrossRefGoogle Scholar
  11. 11.
    Grimson, W.E.L., Lozano-Pérez, T.: Localizing overlapping parts by searching the interpretation tree. IEEE Transactions on Pattern Analysis and Machine Intelligence 9(4), 469–482 (1987)CrossRefGoogle Scholar
  12. 12.
    Harris, C., Stephens, M.: A combined edge and corner detector. In: 4th Alvey Vision Conference, Manchester, UK, pp. 189–192 (1988)Google Scholar
  13. 13.
    Huttenlocher, D.P., Ullman, S.: Object recognition using alignment. In: International Conference on Computer Vision, pp. 102–111 (1987)Google Scholar
  14. 14.
    Koenderink, J.J., van Doorn, A.J.: Affine structure from motion. Journal of the Optical Society of America 8(2), 377–385 (1991)CrossRefGoogle Scholar
  15. 15.
    Lamdan, Y., Wolfson, H.J.: Geometric hashing: A general and efficient model-based recognition scheme. In: International Conference on Computer Vision, pp. 238–249 (1988)Google Scholar
  16. 16.
    Lamdan, Y., Wolfson, H.J.: On the error analysis of ’geometric hashing’. In: Conference on Computer Vision and Pattern Recognition, Maui, Hawaii, pp. 22–27 (1991)Google Scholar
  17. 17.
    Lindeberg, T.: Feature detection with automatic scale selection. International Journal of Computer Vision 30(2), 77–116 (1998)Google Scholar
  18. 18.
    Lindeberg, T., Gårding, J.: Shape-adapted smoothing in estimation of 3D depth cues from affine distortions of local 2D brightness structure. In: Eklundh, J.-O. (ed.) ECCV 1994. LNCS, vol. 800, pp. 389–400. Springer, Heidelberg (1994)Google Scholar
  19. 19.
    Lowe, D.: The viewpoint consistency constraint. International Journal of Computer Vision 1(1), 57–72 (1987)CrossRefGoogle Scholar
  20. 20.
    Lowe, D.G.: Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision 60(2), 91–110 (2004)CrossRefGoogle Scholar
  21. 21.
    Mahamud, S., Hebert, M.: The optimal distance measure for object detection. In: Conference on Computer Vision and Pattern Recognition (2003)Google Scholar
  22. 22.
    Matas, J., Chum, O., Urban, M., Pajdla, T.: Robust wide baseline stereo from maximally stable extremal regions. In: British Machine Vision Conference, vol.I, pp. 384–393 (2002)Google Scholar
  23. 23.
    Mikolajczyk, K., Schmid, C.: Indexing based on scale invariant interest points. In: International Conference on Computer Vision, Vancouver, Canada, pp. 525–531 (July 2001)Google Scholar
  24. 24.
    Mikolajczyk, K., Schmid, C.: An affine invariant interest point detector. In: Heyden, A., Sparr, G., Nielsen, M., Johansen, P. (eds.) ECCV 2002. LNCS, vol. 2350, pp. 128–142. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  25. 25.
    Moreels, P., Maire, M., Perona, P.: Recognition by probabilistic hypothesis construction. In: Pajdla, T., Matas, J(G.) (eds.) ECCV 2004. LNCS, vol. 3021, pp. 55–68. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  26. 26.
    Mundy, J.L., Zisserman, A.: Geometric Invariance in Computer Vision. MIT Press, Cambridge (1992)Google Scholar
  27. 27.
    Mundy, J.L., Zisserman, A., Forsyth, D.A. (eds.): AICV 1993. LNCS, vol. 825. Springer, Heidelberg (1994)Google Scholar
  28. 28.
    Murase, H., Nayar, S.K.: Visual learning and recognition of 3D objects from appearance. International Journal of Computer Vision 14, 5–24 (1995)CrossRefGoogle Scholar
  29. 29.
    Poelman, C.J., Kanade, T.: A paraperspective factorization method for shape and motion recovery. IEEE Transactions on Pattern Analysis and Machine Intelligence 19(3), 206–218 (1997)CrossRefGoogle Scholar
  30. 30.
    Ponce, J.: On computing metric upgrades of projective reconstructions under the rectangular pixel assumption. In: Pollefeys, M., Van Gool, L., Zisserman, A., Fitzgibbon, A.W. (eds.) SMILE 2000. LNCS, vol. 2018, p. 52. Springer, Heidelberg (2001)CrossRefGoogle Scholar
  31. 31.
    Pritchett, P., Zisserman, A.: Wide baseline stereo matching. In: International Conference on Computer Vision, Bombay, India, pp. 754–760 (1998)Google Scholar
  32. 32.
    Rothganger, F., Lazebnik, S., Schmid, C., Ponce, J.: Segmenting, modeling, and matching video clips containing multiple moving objects. In: Conference on Computer Vision and Pattern Recognition, Washington, D.C., vol. 2, pp. 914–921 (June 2004)Google Scholar
  33. 33.
    Rothganger, F., Lazebnik, S., Schmid, C., Ponce, J.: 3D object modeling and recognition using local affine-invariant image descriptors and multi-view spatial constraints. International Journal of Computer Vision 66(3) (2006)Google Scholar
  34. 34.
    Schaffalitzky, F., Zisserman, A.: Automated scene matching in movies. In: Lew, M., Sebe, N., Eakins, J.P. (eds.) CIVR 2002. LNCS, vol. 2383, p. 186. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  35. 35.
    Schaffalitzky, F., Zisserman, A.: Multi-view matching for unordered image sets, or how do I organize my holiday snaps? In: Heyden, A., Sparr, G., Nielsen, M., Johansen, P. (eds.) ECCV 2002. LNCS, vol. 2350, pp. 414–431. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  36. 36.
    Schmid, C., Mohr, R.: Local grayvalue invariants for image retrieval. IEEE Transactions on Pattern Analysis and Machine Intelligence 19(5), 530–535 (1997)CrossRefGoogle Scholar
  37. 37.
    Sivic, J., Zisserman, A.: Video Google: A text retrieval approach to object matching in videos. In: International Conference on Computer Vision (2003)Google Scholar
  38. 38.
    Tell, D., Carlsson, S.: Wide baseline point matching using affine invariants computed from intensity profiles. In: ECCV 2000. LNCS, vol. 1842-1843, pp. 814–828. Springer, Heidelberg (2000)Google Scholar
  39. 39.
    Tomasi, C., Kanade, T.: Shape and motion from image streams: a factorization method. International Journal of Computer Vision 9(2), 137–154 (1992)CrossRefGoogle Scholar
  40. 40.
    Torr, P., Zisserman, A.: Mlesac: A new robust estimator with application to estimating image geometry. Computer Vision and Image Understanding 78(1), 138–156 (2000)CrossRefGoogle Scholar
  41. 41.
    Torr, P.: Motion Segmentation and Outlier Detection. Ph.D thesis, University of Oxford (1995)Google Scholar
  42. 42.
    Triggs, B., McLauchlan, P.F., Hartley, R.I., Fitzgibbon, A.W.: Bundle Adjustment – A Modern Synthesis. In: Triggs, B., Zisserman, A., Szeliski, R. (eds.) ICCV-WS 1999. LNCS, vol. 1883, pp. 298–372. Springer, Heidelberg (2000)CrossRefGoogle Scholar
  43. 43.
    Turk, M., Pentland, A.: Eigenfaces for recognition. Journal of Cognitive Neuroscience 3(1), 71–86 (1991)CrossRefGoogle Scholar
  44. 44.
    Tuytelaars, T., Van Gool, L.: Matching widely separated views based on affine invariant regions. International Journal of Computer Vision 59(1), 61–85 (2004)CrossRefGoogle Scholar
  45. 45.
    Weinshall, D., Tomasi, C.: Linear and incremental acquisition of invariant shape models from image sequences. IEEE Transactions on Pattern Analysis and Machine Intelligence 17(5), 512–517 (1995)CrossRefGoogle Scholar
  46. 46.
    Yeung, M.M., Liu, B.: Efficient matching and clustering of video shots. In: International Conference on Image Processing, Washington D.C., vol. 1, pp. 338–341 (October 1995)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Fred Rothganger
    • 1
  • Svetlana Lazebnik
    • 1
  • Cordelia Schmid
    • 2
  • Jean Ponce
    • 1
  1. 1.Department of Computer Science and Beckman InstituteUniversity of Illinois at Urbana-ChampaignUrbanaUSA
  2. 2.INRIA Rhône-AlpesMontbonnotFrance

Personalised recommendations