Self-similar Sketch

  • Andrea Vedaldi
  • Andrew Zisserman
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7573)


We introduce the self-similar sketch, a new method for the extraction of intermediate image features that combines three principles: detection of self-similarity structures, nonaccidental alignment, and instance-specific modelling. The method searches for self-similar image structures that form nonaccidental patterns, for example collinear arrangements. We demonstrate a simple implementation of this idea where self-similar structures are found by looking for SIFT descriptors that map to the same visual words in image-specific vocabularies. This results in a visual word map which is searched for elongated connected components. Finally, segments are fitted to these connected components, extracting linear image structures beyond the ones that can be captured by conventional edge detectors, as the latter implicitly assume a specific appearance for the edges (steps). The resulting collection of segments constitutes a “sketch” of the image. This is applied to the task of estimating vanishing points, horizon, and zenith in standard benchmark data, obtaining state-of-the-art results. We also propose a new vanishing point estimation algorithm based on recently introduced techniques for the continuous-discrete optimisation of energies arising from model selection priors.


self-similarity feature detector vanishing point estimation UFL 


  1. 1.
    Baumberg, A.: Reliable feature matching across widely separated views. In: CVPR, pp. 1774–1781. IEEE Press, New York (2000)Google Scholar
  2. 2.
    Mikolajczyk, K., Schmid, C.: An Affine Invariant Interest Point Detector. In: Heyden, A., Sparr, G., Nielsen, M., Johansen, P. (eds.) ECCV 2002, Part I. LNCS, vol. 2350, pp. 128–142. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  3. 3.
    Matas, J., Chum, O., Urban, M., Pajdla, T.: Robust wide baseline stereo from maximally stable extremal regions. In: Rosin, P.L., Marshall, A.D. (eds.) BMVC, British Machine Vision Association (2002)Google Scholar
  4. 4.
    Lowe, D.G.: Object recognition from local scale-invariant features. In: ICCV, pp. 1150–1157 (1999)Google Scholar
  5. 5.
    Vedaldi, A., Soatto, S.: Features for recognition: Viewpoint invariance for non-planar scenes. In: ICCV, pp. 1474–1481. IEEE Press, New York (2005)Google Scholar
  6. 6.
    Harris, C., Stephens, M.: A combined corner and edge detector. In: Proc. of The Fourth Alvey Vision Conference, pp. 147–151 (1988)Google Scholar
  7. 7.
    Canny, J.: A computational approach to edge detection. IEEE Trans. on Patt. Analysis and Machine Intell. 8 (1986)Google Scholar
  8. 8.
    Lindeberg, T.: Principles for automatic scale selection. Technical Report ISRN KTH/NA/P 98/14 SE, Royal Institute of Technology (1998)Google Scholar
  9. 9.
    Arbelaez, P., Maire, M., Fowlkes, C., Malik, J.: Contour detection and hierarchical image segmentation. IEEE Trans. on Patt. Analysis and Machine Intell. 33, 898–916 (2011)CrossRefGoogle Scholar
  10. 10.
    Rosten, E., Drummond, T.: Machine Learning for High-Speed Corner Detection. In: Leonardis, A., Bischof, H., Pinz, A. (eds.) ECCV 2006. LNCS, vol. 3951, pp. 430–443. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  11. 11.
    Shechtman, E., Irani, M.: Matching local self-similarities across images and videos. In: CVPR. IEEE Press, New York (2007)Google Scholar
  12. 12.
    Biederman, I.: Recognition-by-components: A theory of human image understanding. Psychological Review 94 (1987)Google Scholar
  13. 13.
    Agin, G.J., Binford, T.O.: Computer description of curved objects. IEEE Trans. Comp.s 25, 439–449 (1976)zbMATHCrossRefGoogle Scholar
  14. 14.
    Gibson, B.M., Gosselin, O.F., Gosselin, F., Schyns, P.G., Wasserman, E.A.: Nonaccidental properties underlie shape recognition in mammalian and nonmammalian vision. Current Biology 17 (2007)Google Scholar
  15. 15.
    Jojic, N., Caspi, Y.: Capturing image structure with probabilistic index maps. In: CVPR (1), pp. 212–219 (2004)Google Scholar
  16. 16.
    Deselaers, T., Ferrari, V.: Global and efficient self-similarity for object classification and detection. In: CVPR, pp. 1633–1640. IEEE Press, New York (2010)Google Scholar
  17. 17.
    Isack, H.N., Boykov, Y.: Energy-based geometric multi-model fitting. Int. Journal of Comp. Vision 97, 123–147 (2012)zbMATHMathSciNetCrossRefGoogle Scholar
  18. 18.
    Delong, A., Osokin, A., Isack, H.N., Boykov, Y.: Fast approximate energy minimization with label costs. Int. Journal of Comp. Vision 96, 1–27 (2012)zbMATHMathSciNetCrossRefGoogle Scholar
  19. 19.
    Kahn, P., Kitchen, L.J., Riseman, E.M.: A fast line finder for vision-guided robot navigation. IEEE Trans. Patt. Anal. Mach. Intell. 12, 1098–1102 (1990)CrossRefGoogle Scholar
  20. 20.
    Schaffalitzky, F., Zisserman, A.: Planar grouping for automatic detection of vanishing lines and points. Image Vision Comput. 18, 647–658 (2000)CrossRefGoogle Scholar
  21. 21.
    Kǒsecká, J., Zhang, W.: Video Compass. In: Heyden, A., Sparr, G., Nielsen, M., Johansen, P. (eds.) ECCV 2002, Part IV. LNCS, vol. 2353, pp. 476–490. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  22. 22.
    Denis, P., Elder, J.H., Estrada, F.J.: Efficient Edge-Based Methods for Estimating Manhattan Frames in Urban Imagery. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008, Part II. LNCS, vol. 5303, pp. 197–210. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  23. 23.
    Tardif, J.P.: Non-iterative approach for fast and accurate vanishing point detection. In: ICCV, pp. 1250–1257. IEEE Press, New York (2009)Google Scholar
  24. 24.
    Barinova, O., Lempitsky, V.S., Tretiak, E., Kohli, P.: Geometric Image Parsing in Man-Made Environments. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part II. LNCS, vol. 6312, pp. 57–70. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  25. 25.
    Flint, A., Mei, C., Reid, I.D., Murray, D.W.: Growing semantically meaningful models for visual slam. In: CVPR, pp. 467–474. IEEE Press, New York (2010)Google Scholar
  26. 26.
    Flint, A., Murray, D.W., Reid, I.: Manhattan scene understanding using monocular, stereo, and 3d features. In: Metaxas, D.N., Quan, L., Sanfeliu, A., Gool, L.J.V. (eds.) ICCV, pp. 2228–2235. IEEE Press, New York (2011)Google Scholar
  27. 27.
    Hoiem, D., Efros, A.A., Hebert, M.: Closing the loop in scene interpretation. In: CVPR. IEEE Press, New York (2008)Google Scholar
  28. 28.
    Tretiak, E., Barinova, O., Kohli, P., Lempitsky, V.S.: Geometric image parsing in man-made environments. Int. Journal of Comp. Vision 97, 305–321 (2012)CrossRefGoogle Scholar
  29. 29.
    Vedaldi, A., Fulkerson, B.: VLFeat: an open and portable library of computer vision algorithms. In: Bimbo, A.D., Chang, S.F., Smeulders, A.W.M. (eds.) ACM Multimedia, pp. 1469–1472. ACM (2010)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Andrea Vedaldi
    • 1
  • Andrew Zisserman
    • 1
  1. 1.Department of Engineering ScienceUniversity of OxfordUK

Personalised recommendations