Advertisement

International Journal of Computer Vision

, Volume 66, Issue 3, pp 231–259 | Cite as

3D Object Modeling and Recognition Using Local Affine-Invariant Image Descriptors and Multi-View Spatial Constraints

  • Fred Rothganger
  • Svetlana Lazebnik
  • Cordelia Schmid
  • Jean Ponce
Article

Abstract.

This article introduces a novel representation for three-dimensional (3D) objects in terms of local affine-invariant descriptors of their images and the spatial relationships between the corresponding surface patches. Geometric constraints associated with different views of the same patches under affine projection are combined with a normalized representation of their appearance to guide matching and reconstruction, allowing the acquisition of true 3D affine and Euclidean models from multiple unregistered images, as well as their recognition in photographs taken from arbitrary viewpoints. The proposed approach does not require a separate segmentation stage, and it is applicable to highly cluttered scenes. Modeling and recognition results are presented.

Keywords:

three-dimensional object recognition image-based modeling affine-invariant image descriptors multi-view geometry 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Ayache, N. and Faugeras, O.D. 1986. Hyper: A new approach for the recognition and positioning of two-dimensional objects. IEEE Transactions on Pattern Analysis and Machine Intelligence, 8(1):44–54.Google Scholar
  2. Baker, S. and Kanade, T. 2002. Limits on super-resolution and how to break them. IEEE Transactions on Pattern Analysis and Machine Intelligence, 24(9):1167–1183.CrossRefGoogle Scholar
  3. Baumberg, A. 2000. Reliable feature matching across widely separated views. In Conference on Computer Vision and Pattern Recognition, pp. 774–781.Google Scholar
  4. Belhumeur, P.N., Hespanha, J.P., and Kriegman, D.J. 1997. Eigenfaces vs. fisherfaces: Recognition using class specific linear projection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 19(7):711–720.CrossRefGoogle Scholar
  5. Blostein, D. and Ahuja, N. 1989. A multiscale region detector. Computer Vision, Graphics and Image Processing, 45:22–41.CrossRefGoogle Scholar
  6. Burns, J.B., Weiss, R.S., and Riseman, E.M. 1993. View variation of point-set and line-segment features. IEEE Transactions on Pattern Analysis and Machine Intelligence, 15(1):51–68.CrossRefGoogle Scholar
  7. Capel, D. and Zisserman, A. 2001. Super-resolution from multiple views using learnt image models. In Conference on Computer Vision and Pattern Recognition.Google Scholar
  8. Cheeseman, P., Kanefsky, B., Kraft, R., and Stutz, J. 1994. Super-resolved surface reconstruction from multiple Images. Technical report, NASA Ames Research Center.Google Scholar
  9. Crowley, J.L. and Parker, A.C. 1984. A representation of shape based on peaks and ridges in the difference of low-pass transform. IEEE Transactions on Pattern Analysis and Machine Intelligence, 6:156–170.Google Scholar
  10. Duda, R.O., Hart, P.E., and Stork, D.G. 2001. Pattern Classification. 2nd edition. Wiley-Interscience.Google Scholar
  11. Faugeras, O., Luong, Q.T., and Papadopoulo, T. 2001. The Geometry of Multiple Images. MIT Press.Google Scholar
  12. Faugeras, O.D. and Hebert, M. 1986. The representation, recognition, and locating of 3-D objects. International Journal of Robotics Research, 5(3):27–52.Google Scholar
  13. Fergus, R., Perona, P., and Zisserman, A. 2003. Object class recognition by unsupervised scale-invariant learning. In Conference on Computer Vision and Pattern Recognition, vol. II, pp. 264–270.Google Scholar
  14. Ferrari, V., Tuytelaars, T., and Van Gool, L. 2004. Simultaneous object recognition and segmentation by image exploration. In European Conference on Computer Vision.Google Scholar
  15. Fischler, M.A. and Bolles, R.C. 1981. Random sample consensus: A paradigm for model fitting with application to image analysis and automated cartography. Communications ACM, 24(6):381–395.MathSciNetGoogle Scholar
  16. Forsyth, D. and Ponce, J. 2002. Computer Vision: A Modern Approach. Prentice-Hall.Google Scholar
  17. Gårding, J. and Lindeberg, T. 1996. Direct computation of shape cues using scale-adapted spatial derivative operators. International Journal of Computer Vision, 17(2):163–191.Google Scholar
  18. Grimson, W.E.L. 1990. The combinatories of object recognition in cluttered environments using constrained search. Artificial Intelligence Journal, 44(1–2):121–166.MATHMathSciNetGoogle Scholar
  19. Grimson, W.E.L. and Lozano-Pérez, T. 1987. Localizing overlapping parts by searching the interpretation tree. IEEE Transactions on Pattern Analysis and Machine Intelligence, 9(4):469–482.Google Scholar
  20. Harris, C. and Stephens, M. 1988. A combined edge and corner detector. In 4th Alvey Vision Conference, Manchester, UK, pp. 189–192.Google Scholar
  21. Hartley, R. and Zisserman, A. 2000. Multiple View Geometry in Computer Vision. Cambridge University Press.Google Scholar
  22. Huttenlocher, D.P. and Ullman, S. 1987. Object recognition using alignment. In International Conference on Computer Vision, pp. 102–111.Google Scholar
  23. Kadir, T. and Brady, M. 2001. Scale, saliency and image description. International Journal of Computer Vision, 45(2):83–105.CrossRefGoogle Scholar
  24. Koenderink, J.J. and van Doom, A.J. 1991. Affine structure from motion. Journal of the Optical Society of America, 8(2):377–385.Google Scholar
  25. Lamdan, Y. and Wolfson, H.J. 1988. Geometric hashing: A general and efficient model-based reconitiion scheme. In International Conference on Computer Vision, pp. 238–249.Google Scholar
  26. Lamdan, Y. and Wolfson, H.J. 1991. On the Error Analysis of “Geometric hashing.” In Conference on Computer Vision and Pattern Recognition. Maui, Hawaii, pp. 22–27.Google Scholar
  27. Lindeberg, T. 1998. Feature detection with automatic scale selection. International Journal of Computer Vision, 30(2):77–116.Google Scholar
  28. Liu, J., Mundy, J., Forsyth, D., Zisserman, A., and Rothwell, C. 1993. Efficient recognition of rotationally symmetric surfaces and straight homogeneous generalized cylinders. In Conference on Computer Vision and Pattern Recognition. New York City, NY, pp. 123–128.Google Scholar
  29. Lowe, D. 2004. Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, 60(2): 91–110.Google Scholar
  30. Lowe, D.G. 1987. The viewpoint consistency constraint. International Journal of Computer Vision, 1(1):57–72.CrossRefGoogle Scholar
  31. Mahamud, S. and Hebert, M. 2003. The optimal distance measure for object detection. In Conference on Computer Vision and Pattern Recognition.Google Scholar
  32. Mahamud, S., Hebert, M., Omori, Y., and Ponce, J. 2001. Provably-convergent iterative methods for projective structure from motion. In Conference on Computer Vision and Pattern Recognition, pp. 1018–1025.Google Scholar
  33. Matas, J., Chum, O., Urban, M., and Pajdla, T. 2002. Robust wide baseline stereo from maximally stable extremal regions. In British Machine Vision Conference, vol. I, pp. 384–393.Google Scholar
  34. Mikolajczyk, K. and Schmid, C. 2001. Indexing based on scale invariant interest points. In International Conference on Computer Vision. Vancouver, Canada, pp. 525–531.Google Scholar
  35. Mikolajczyk, K. and Schmid, C. 2002. An affine invariant interest point detector. In European Conference on Computer Vision, vol. I. pp. 128–142.Google Scholar
  36. Mikolajczyk, K. and Schmid, C. 2003. A performance evaluation of local descriptors. In Conference on Computer Vision and Pattern Recognition.Google Scholar
  37. Moreels, P., Maire, M., and Perona, P. 2004. Recognition by probabilistic hypothesis construction. In European Conference on Computer Vision.Google Scholar
  38. Mundy, J.L. and Zisserman, A. 1992. Geometric Invariance in Computer Vision. MIT Press.Google Scholar
  39. Mundy, J.L., Zisserman, A., and Forsyth, D. 1994. Applications of Invariance in Computer Vision, vol. 825 of Lecture Notes in Computer Science. Springer-Verlag.Google Scholar
  40. Murase, H. and Nayar, S.K. 1995. Visual learning and recognition of 3-D objects from appearance. International Journal of Computer Vision, 14:5–24.CrossRefGoogle Scholar
  41. Nalwa, V S. 1988. Line-drawing interpretation: A mathematical framework. International Journal of Computer Vision, 2:103–124.CrossRefGoogle Scholar
  42. Pentland, A., Moghaddam, B., and Starner, T. 1994. View-based and modular eigenspaces for face recognition. In Conference on Computer Vision and Pattern Recognition. Seattle, WA.Google Scholar
  43. Poelman, C.J. and Kanade, T. 1997. A paraperspective factorization method for shape and motion recovery. IEEE Transactions on Pattern Analysis and Machine Intelligence, 19(3):206–218.CrossRefGoogle Scholar
  44. Ponce, J. 2000. On computing metric upgrades of projective reconstructions under the rectangular pixel assumption. In Second SMILE Workshop, pp. 18–27.Google Scholar
  45. Ponce, J., Chelberg, D., and Mann, W. 1989. Invariant properties of straight homogeneous generalized cylinders and their contours. IEEE Transactions on Pattern Analysis and Machine Intelligence, 11(9):951–966.CrossRefGoogle Scholar
  46. Pope, A.R. and Lowe, D.G. 2000. Probabilistic models of appearance for 3-D object recognition. International Journal of Computer Vision, 40(2):149–167.CrossRefGoogle Scholar
  47. Pritchett, P. and Zisserman, A. 1998. Wide baseline stereo matching. In International Conference on Computer Vision, Bombay, India, pp. 754–760.Google Scholar
  48. Rothganger, F., Lazebnik, S., Schmid, C., and Ponce, J. 2003. 3D object modeling and recognition using affine-invariant Patches and Multi-View Spatial Constraints. In Conference on Computer Vision and Pattern Recognition, vol. II, pp. 272–277.Google Scholar
  49. Rothganger, F., Lazebnik, S., Schmid, C., and Ponce, J. 2004. Segmenting, modeling, and matching video clips containing multiple moving objects. In Conference on Computer Vision and Pattern Recognition, Washington, DC, June 2004, Vol. 2, pp. 914–921.Google Scholar
  50. Schaffalitzky, F. and Zisserman, A. 2002. Multi-view matching for unordered image sets, or “How do I organize my holiday snaps?”. In European Conference on Computer Vision, vol. I, pp. 414–431.Google Scholar
  51. Schmid, C. and Mohr, R. 1997. Local grayvalue invariants for image retrieval. IEEE Transactions on Pattern Analysis and Machine Intelligence, 19(5):530–535.Google Scholar
  52. Schneiderman, H. and Kanade, T. 2000. A statistical method for 3D object detection applied to faces and cars. In Conference on Computer Vision and Pattern Recognition.Google Scholar
  53. Selinger, A. and Nelson, R. 1999. A perceptual grouping hierarchy for appearance-based 3D object recognition. Computer Vision and Image Understanding, 76(1):83–92.CrossRefGoogle Scholar
  54. Tell, D. and Carlsson, S. 2000. Wide baseline point matching using affine invariants computed from intensity profiles. In Proc. 6th ECCV. Dublin, Ireland, pp. 814–828, Springer LNCS 1842–1843.Google Scholar
  55. Thompson, D. and Mundy, J. 1987. Three-dimensional model matching from an unconstrained viewpoint. In International Conference on Robotics and Automation. Raleigh, NC, pp. 208–220.Google Scholar
  56. Tomasi, C. and Kanade, T. 1992. Shape and motion from image streams: A factorization method. International Journal of Computer Vision, 9(2):137–154.CrossRefGoogle Scholar
  57. Torr, P. and Zisserman, A.Z. 2000. MLESAC: A new robust estimator with application to estimating image geometry. Computer Vision and Image Understanding, 78(1):138–156.CrossRefGoogle Scholar
  58. Triggs, B., McLauchlan, P.F., Hartley, R.I., and Fitzgibbon, A.W. 1999. Bundle adjustment---A modern synthesis. In: B. Triggs, A. Zisserman, and R. Szeliski (Eds.), Vision Algorithms, Corfu, Greece, pp. 298–372, Spinger-Verlag, LNCS 1883.Google Scholar
  59. Turk, M. and Pentland, A. 1991. Eigenfaces for recognition. Journal of Cognitive Neuroscience, 3(1):71–86.Google Scholar
  60. Tuytelaars, T. and Van Gool, L. 2004. Matching widely separated views based on affinely invariant neighbourhoods. International Journal of Computer Vision. (in press)Google Scholar
  61. Voorhees, H. and Poggio, T. 87. Detecting textons and texture boundaries in natural images. In International Conference on Computer Vision, pp. 250–258.Google Scholar
  62. Weber, M., Welling, M., and Perona, P. 2000. Unsupervised learning of models for recognition. In European Conference on Computer Vision.Google Scholar
  63. Weinshall, D. and Tomasi, C. 1995. Linear and incremental acquisition of invariant shape models from image sequences. IEEE Transactions on Pattern Analysis and Machine Intelligence, 17(5):512–517.CrossRefGoogle Scholar

Copyright information

© Springer Science + Business Media, Inc. 2006

Authors and Affiliations

  • Fred Rothganger
    • 1
  • Svetlana Lazebnik
    • 1
  • Cordelia Schmid
    • 2
  • Jean Ponce
    • 3
  1. 1.Department of Computer Science and Beckman InstituteUniversity of Illinois at Urbana-ChampaignUrbanaUSA
  2. 2.INRIAMontbonnotFrance
  3. 3.Department of Computer Science and Beckman InstituteUniversity of Illinois at Urbana-ChampaignUrbanaUSA

Personalised recommendations