Abstract
We present a novel Object Recognition approach based on affine invariant regions. It actively counters the problems related to the limited repeatability of the region detectors, and the difficulty of matching, in the presence of large amounts of background clutter and particularly challenging viewing conditions. After producing an initial set of matches, the method gradually explores the surrounding image areas, recursively constructing more and more matching regions, increasingly farther from the initial ones. This process covers the object with matches, and simultaneously separates the correct matches from the wrong ones. Hence, recognition and segmentation are achieved at the same time. The approach includes a mechanism for capturing the relationships between multiple model views and exploiting these for integrating the contributions of the views at recognition time. This is based on an efficient algorithm for partitioning a set of region matches into groups lying on smooth surfaces. Integration is achieved by measuring the consistency of configurations of groups arising from different model views. Experimental results demonstrate the stronger power of the approach in dealing with extensive clutter, dominant occlusion, and large scale and viewpoint changes. Non-rigid deformations are explicitly taken into account, and the approximative contours of the object are produced. All presented techniques can extend any view-point invariant feature extractor.
Similar content being viewed by others
References
Baumberg, A. 2000. Reliable feature matching across widely separated views. In ICCV, pp. 774–781.
Bebis, G., Georgiopoulos, M., and Lobo, N.V. 1995. Learning geometric hashing functions for model-based object recognition. In ICCV, pp. 543–548.
Chum, O., Matas, J., and Obdrzalek, S. 2003. Epipolar geometry from three correspondences. In Computer Vision Winter Workshop.
Cyr, C. and Kimia, B. 2001. 3D object recognition using similarity-based aspect graph. ICCV, 254–261.
Ferrari, V. 2004. Affine Invariant Regions ++. PhD Thesis, Selected Readings in Vision and Graphics, Springer Verlag, Zuerich, CH. www.vision.ee.ethz.ch/~ferrari
Ferrari, V., Tuytelaars, T., and Van-Gool, L. 2003. Wide-baseline multiple-view correspondences. CVPR, I:718–728.
Ferrari, V., Tuytelaars, T., and Van-Gool, L. 2004. Integrating multiple model views for object recognition. CVPR.
Ferrari, V., Tuytelaars, T., and Van-Gool, L. 2004. Simultaneous object recognition and segmentation by image exploration. ECCV, I:40–54.
Kolmogorov, V. and Zabih, R, 2002. What energy functions can be minimized via graph cuts ? ECCV, III:65–78.
Lazebnik, S., Schmid, C., and Ponce, J. 2004. Semi-local affine parts for object recognition. BMVC, II:779–788.
Leibe, B. and Schiele, B. 2004. Scale-invariant object categorization using a scale-adaptive mean-shift search, DAGM, 145–153.
Lhuillier, M. and Quan, L. 2002. Match propagation for image-based modeling and rendering, PAMI, 24(8).
Lowe, D. 2001. Local feature view clustering for 3D object recognition. CVPR, 682–688.
Lowe, D. 2004. Distinctive image features from scale-invariant keypoints. IJCV, 60(2): 91–110.
Matas, J., Chum, O., Urban, M., and Pajdla, T. 2002. Robust wide baseline stereo from maximally stable extremal regions. BMVC, 384–393.
Mikolajczyk, K. and Schmid, C. 2001. Indexing based on scale-invariant interest points. ICCV, I:525–531.
Mikolajczyk, K. and Schmid, C. 2002. An affine invariant interest point detector. ECCV, 128–142.
Mikolajczyk, K. and Schmid, C. 2003. A performance evaluation of local descriptors. CVPR, II:257–263.
Murase, H. and Nayar, S. 1995. Visual learning and recognition of 3d objects from appearance. IJCV, 14(1).
Obrdzalek, S. and Matas, J. 2002. Object recognition using local affine frames on distinguished regions. BMVC, 414–431.
Osian, M. and Van-Gool, L. 2004. Video shot characterization. Machine Vision and Applications, 15(3): 172–177.
Pritchett, P. and Zisserman, A.1998. Wide baseline stereo matching. ICCV, 754–760.
Rothganger, F., Lazebnik, S., Schmid, C., and Ponce, J. 2005. 3D object modeling and recognition using local affine-invariant image descriptors and multi-view spatial constraints. IJCV, in press.
Schaffalitzky, F. and Zisserman, A. 2002. Automated scene matching in movies. In Workshop on Content-Based Image and Video Retrieval, pp. 186–197.
Schaffalitzky, F. and Zisserman, A. 2002. Multi-view matching for unordered image sets. ECCV, I:414–427.
Schmid, C. 1996. Combining greyvalue invariants with local constraints for object recognition. CVPR, 872–877.
Schmid, C. 1999. A structured probabilistic model for recognition. CVPR, II:485–490.
Selinger, A. and Nelson, R.C. 1999. A perceptual grouping hierarchy for appearance-based 3d object recognition. Computer Vision and Image Understanding, 76(1): 83–92.
Sivic, J. and Zisserman, A. 2003. Video google: A text retrieval approach to object matching in videos. ICCV.
Swain, M.J., and Ballard, B.H. 1991. Color indexing. IJCV, 7(1): 11–32.
Tell, D. and Carlsson, S. 2002. Combining appearance and topology for wide baseline matching. ECCV, I:68–81.
Torr, P.H.S. and Murray, D.W. 1997. The development and comparison of robust methods for estimating the fundamental matrix. IJCV, 24(3): 271–300.
Tuytelaars, T. and Van-Gool, L. 2000. Wide baseline stereo based on local, affinely invariant regions. BMVC.
Tuytelaars, T., Van-Gool, L., Dhaene, L., and Koch, R. 1999. Matching affinely invariant regions for visual servoing. In Intl. Conference on Robotics and Automation, 1601–1606.
Yu, S.X., Gross, R., and Shi, J. 2002. Concurrent object recognition and segmentation by graph partitioning. NIPS.
Zhang, Z., Deriche, R., Faugeras, O., and Luong, Q. 1995. A robust technique for matching two uncalibrated images through the recovery of the unknown epipolar geometry. Artificial Intelligence, 78: 87–119.
Author information
Authors and Affiliations
Corresponding author
Additional information
This research was supported by EC project VIBES, the Fund for Scientific Research Flanders, and the IST Network of Excellence PASCAL.
Rights and permissions
About this article
Cite this article
Ferrari, V., Tuytelaars, T. & Van Gool, L. Simultaneous Object Recognition and Segmentation from Single or Multiple Model Views. Int J Comput Vision 67, 159–188 (2006). https://doi.org/10.1007/s11263-005-3964-7
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11263-005-3964-7