Skip to main content
Log in

Simultaneous Object Recognition and Segmentation from Single or Multiple Model Views

  • Published:
International Journal of Computer Vision Aims and scope Submit manuscript

Abstract

We present a novel Object Recognition approach based on affine invariant regions. It actively counters the problems related to the limited repeatability of the region detectors, and the difficulty of matching, in the presence of large amounts of background clutter and particularly challenging viewing conditions. After producing an initial set of matches, the method gradually explores the surrounding image areas, recursively constructing more and more matching regions, increasingly farther from the initial ones. This process covers the object with matches, and simultaneously separates the correct matches from the wrong ones. Hence, recognition and segmentation are achieved at the same time. The approach includes a mechanism for capturing the relationships between multiple model views and exploiting these for integrating the contributions of the views at recognition time. This is based on an efficient algorithm for partitioning a set of region matches into groups lying on smooth surfaces. Integration is achieved by measuring the consistency of configurations of groups arising from different model views. Experimental results demonstrate the stronger power of the approach in dealing with extensive clutter, dominant occlusion, and large scale and viewpoint changes. Non-rigid deformations are explicitly taken into account, and the approximative contours of the object are produced. All presented techniques can extend any view-point invariant feature extractor.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Baumberg, A. 2000. Reliable feature matching across widely separated views. In ICCV, pp. 774–781.

  • Bebis, G., Georgiopoulos, M., and Lobo, N.V. 1995. Learning geometric hashing functions for model-based object recognition. In ICCV, pp. 543–548.

  • Chum, O., Matas, J., and Obdrzalek, S. 2003. Epipolar geometry from three correspondences. In Computer Vision Winter Workshop.

  • Cyr, C. and Kimia, B. 2001. 3D object recognition using similarity-based aspect graph. ICCV, 254–261.

  • Ferrari, V. 2004. Affine Invariant Regions ++. PhD Thesis, Selected Readings in Vision and Graphics, Springer Verlag, Zuerich, CH. www.vision.ee.ethz.ch/~ferrari

  • Ferrari, V., Tuytelaars, T., and Van-Gool, L. 2003. Wide-baseline multiple-view correspondences. CVPR, I:718–728.

    Google Scholar 

  • Ferrari, V., Tuytelaars, T., and Van-Gool, L. 2004. Integrating multiple model views for object recognition. CVPR.

  • Ferrari, V., Tuytelaars, T., and Van-Gool, L. 2004. Simultaneous object recognition and segmentation by image exploration. ECCV, I:40–54.

    Google Scholar 

  • Kolmogorov, V. and Zabih, R, 2002. What energy functions can be minimized via graph cuts ? ECCV, III:65–78.

  • Lazebnik, S., Schmid, C., and Ponce, J. 2004. Semi-local affine parts for object recognition. BMVC, II:779–788.

    Google Scholar 

  • Leibe, B. and Schiele, B. 2004. Scale-invariant object categorization using a scale-adaptive mean-shift search, DAGM, 145–153.

  • Lhuillier, M. and Quan, L. 2002. Match propagation for image-based modeling and rendering, PAMI, 24(8).

  • Lowe, D. 2001. Local feature view clustering for 3D object recognition. CVPR, 682–688.

  • Lowe, D. 2004. Distinctive image features from scale-invariant keypoints. IJCV, 60(2): 91–110.

    Article  Google Scholar 

  • Matas, J., Chum, O., Urban, M., and Pajdla, T. 2002. Robust wide baseline stereo from maximally stable extremal regions. BMVC, 384–393.

  • Mikolajczyk, K. and Schmid, C. 2001. Indexing based on scale-invariant interest points. ICCV, I:525–531.

    Google Scholar 

  • Mikolajczyk, K. and Schmid, C. 2002. An affine invariant interest point detector. ECCV, 128–142.

  • Mikolajczyk, K. and Schmid, C. 2003. A performance evaluation of local descriptors. CVPR, II:257–263.

    Google Scholar 

  • Murase, H. and Nayar, S. 1995. Visual learning and recognition of 3d objects from appearance. IJCV, 14(1).

  • Obrdzalek, S. and Matas, J. 2002. Object recognition using local affine frames on distinguished regions. BMVC, 414–431.

  • Osian, M. and Van-Gool, L. 2004. Video shot characterization. Machine Vision and Applications, 15(3): 172–177.

    Article  Google Scholar 

  • Pritchett, P. and Zisserman, A.1998. Wide baseline stereo matching. ICCV, 754–760.

  • Rothganger, F., Lazebnik, S., Schmid, C., and Ponce, J. 2005. 3D object modeling and recognition using local affine-invariant image descriptors and multi-view spatial constraints. IJCV, in press.

  • Schaffalitzky, F. and Zisserman, A. 2002. Automated scene matching in movies. In Workshop on Content-Based Image and Video Retrieval, pp. 186–197.

  • Schaffalitzky, F. and Zisserman, A. 2002. Multi-view matching for unordered image sets. ECCV, I:414–427.

    Google Scholar 

  • Schmid, C. 1996. Combining greyvalue invariants with local constraints for object recognition. CVPR, 872–877.

  • Schmid, C. 1999. A structured probabilistic model for recognition. CVPR, II:485–490.

    Google Scholar 

  • Selinger, A. and Nelson, R.C. 1999. A perceptual grouping hierarchy for appearance-based 3d object recognition. Computer Vision and Image Understanding, 76(1): 83–92.

    Article  Google Scholar 

  • Sivic, J. and Zisserman, A. 2003. Video google: A text retrieval approach to object matching in videos. ICCV.

  • Swain, M.J., and Ballard, B.H. 1991. Color indexing. IJCV, 7(1): 11–32.

    Article  Google Scholar 

  • Tell, D. and Carlsson, S. 2002. Combining appearance and topology for wide baseline matching. ECCV, I:68–81.

    Google Scholar 

  • Torr, P.H.S. and Murray, D.W. 1997. The development and comparison of robust methods for estimating the fundamental matrix. IJCV, 24(3): 271–300.

    Article  Google Scholar 

  • Tuytelaars, T. and Van-Gool, L. 2000. Wide baseline stereo based on local, affinely invariant regions. BMVC.

  • Tuytelaars, T., Van-Gool, L., Dhaene, L., and Koch, R. 1999. Matching affinely invariant regions for visual servoing. In Intl. Conference on Robotics and Automation, 1601–1606.

  • Yu, S.X., Gross, R., and Shi, J. 2002. Concurrent object recognition and segmentation by graph partitioning. NIPS.

  • Zhang, Z., Deriche, R., Faugeras, O., and Luong, Q. 1995. A robust technique for matching two uncalibrated images through the recovery of the unknown epipolar geometry. Artificial Intelligence, 78: 87–119.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Vittorio Ferrari.

Additional information

This research was supported by EC project VIBES, the Fund for Scientific Research Flanders, and the IST Network of Excellence PASCAL.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ferrari, V., Tuytelaars, T. & Van Gool, L. Simultaneous Object Recognition and Segmentation from Single or Multiple Model Views. Int J Comput Vision 67, 159–188 (2006). https://doi.org/10.1007/s11263-005-3964-7

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11263-005-3964-7

Keywords

Navigation