Simultaneous Object Recognition and Segmentation by Image Exploration

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3021)


Methods based on local, viewpoint invariant features have proven capable of recognizing objects in spite of viewpoint changes, occlusion and clutter. However, these approaches fail when these factors are too strong, due to the limited repeatability and discriminative power of the features. As additional shortcomings, the objects need to be rigid and only their approximate location is found. We present a novel Object Recognition approach which overcomes these limitations. An initial set of feature correspondences is first generated. The method anchors on it and then gradually explores the surrounding area, trying to construct more and more matching features, increasingly farther from the initial ones. The resulting process covers the object with matches, and simultaneously separates the correct matches from the wrong ones. Hence, recognition and segmentation are achieved at the same time. Only very few correct initial matches suffice for reliable recognition. The experimental results demonstrate the stronger power of the presented method in dealing with extensive clutter, dominant occlusion, large scale and viewpoint changes. Moreover non-rigid deformations are explicitly taken into account, and the approximative contours of the object are produced. The approach can extend any viewpoint invariant feature extractor.


Test Image Model Image Expansion Phase Correct Match Viewpoint Change 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Ferrari, V., Tuytelaars, T., Van Gool, L.: Wide-baseline Multiple-view Correspondences. IEEE Comp. Vis. and Patt. Rec. 1, 718–725 (2003)Google Scholar
  2. 2.
    Tuytelaars, T., Van Gool, L.: Wide Baseline Stereo based on Local, Affinely invariant Regions. In: Brit. Mach. Vis. Conf., pp. 412–422 (2000)Google Scholar
  3. 3.
    Torr, P.H.S., Murray, D.W.: The development and comparison of robust methods for estimating the fundamental matrix. IJCV 24(3), 271–300 (1997)CrossRefGoogle Scholar
  4. 4.
    Rothganger, F., Lazebnik, S., Schmid, C., Ponce, J.: 3D Object Modeling and Recognition Using Affine-Invariant Patches and Multi-View Spatial Constraints. In: IEEE Comp. Vis. and Patt. Rec., pp. 272–277 (2003)Google Scholar
  5. 5.
    Mikolajczyk, K., Schmid, C.: An affine invariant interest point detector. In: Heyden, A., Sparr, G., Nielsen, M., Johansen, P. (eds.) ECCV 2002. LNCS, vol. 2350, pp. 128–142. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  6. 6.
    Schmid, C.: Combining greyvalue invariants with local constraints for object recognition. In: IEEE Comp. Vis. and Patt. Rec., pp. 872–877 (1996)Google Scholar
  7. 7.
    Obdzalek, S., Matas, J.: Object Recognition using Local Affine Frames on Distinguished Regions. In: Brit. Mach. Vis. Conf., pp. 414–431 (2002)Google Scholar
  8. 8.
    Lowe, D.: Distinctive Image Features from Scale-Invariant Keypoints. Submitted to Intl. Journ. of Comp. Vis. (2004)Google Scholar
  9. 9.
    Cyr, C., Kimia, B.: 3D Object Recognition Using Similarity-Based Aspect Graph. In: Intl. Conf. on Comp. Vis. (2001)Google Scholar
  10. 10.
    Murase, H., Nayar, S.: Visual Learning and Recognition of 3D Objects from Appearence. Intl. Journ. of Comp. Vis. 14(1) (1995)Google Scholar
  11. 11.
    Baumberg, A.: Reliable feature matching across widely separated views. In: IEEE Comp. Vis. and Patt. Rec., pp. 774–781 (2000)Google Scholar
  12. 12.
    Mikolajczyk, K., Schmid, C.: A performance evaluation of local descriptors. In: IEEE Comp. Vis. and Patt. Rec., vol. II, pp. 257–263 (2003)Google Scholar
  13. 13.
    Schaffalitzky, F., Zisserman, A.: Multi-view matching for unordered image sets. In: Heyden, A., Sparr, G., Nielsen, M., Johansen, P. (eds.) ECCV 2002. LNCS, vol. 2350, pp. 414–431. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  14. 14.
    Schaffalitzky, F., Zisserman, A.: Automated Scene Matching in Movies. In: Lew, M., Sebe, N., Eakins, J.P. (eds.) CIVR 2002. LNCS, vol. 2383, p. 186. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  15. 15.
    Tell, D., Carlsson, S.: Combining Appearance and Topology for Wide Baseline Matching. In: Heyden, A., Sparr, G., Nielsen, M., Johansen, P. (eds.) ECCV 2002. LNCS, vol. 2350, pp. 68–81. Springer, Heidelberg (2002)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2004

Authors and Affiliations

  1. 1.Computer Vision Group (BIWI)ETH ZuerichSwitzerland
  2. 2.ESAT-PSIUniversity of LeuvenBelgium

Personalised recommendations