Skip to main content
Log in

When standard RANSAC is not enough: cross-media visual matching with hypothesis relevancy

  • Original Paper
  • Published:
Machine Vision and Applications Aims and scope Submit manuscript

Abstract

The same scene can be depicted by multiple visual media. For example, the same event can be captured by a comic image or a movie frame; the same object can be represented by a photograph or by a 3D computer graphics model. In order to extract the visual analogies that are at the heart of cross-media analysis, spatial matching is required. This matching is commonly achieved by extracting key points and scoring multiple, randomly generated mapping hypotheses. The more consensus a hypothesis can draw, the higher its score. In this paper, we go beyond the conventional set-size measure for the quality of a match and present a more general hypothesis score that attempts to reflect how likely is each hypothesized transformation to be the correct one for the matching task at hand. This is achieved by considering additional, contextual cues for the relevance of a hypothesized transformation. This context changes from one matching task to another and reflects different properties of the match, beyond the size of a consensus set. We demonstrate that by learning how to correctly score each hypothesis based on these features we are able to deal much more robustly with the challenges required to allow cross-media analysis, leading to correct matches where conventional methods fail.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

Notes

  1. Please see the project webpage for available resources, including our MATLAB functions for rendering and computing the transformations. URL: http://www.openu.ac.il/home/hassner/projects/ransaclearn.

  2. Source: http://sketchup.google.com/3dwarehouse.

  3. Source: http://www.minecraft.net.

References

  1. Cui, X., Kim, H., Park, E., Choi, H.: Robust and accurate pattern matching in fuzzy space for fiducial mark alignment. MVA 24(3), 447–459 (2012)

    Google Scholar 

  2. Yoon, S., Scherer, M., Schreck, T., Kuijper, A.: Sketch-based 3D model retrieval using diffusion tensor fields of suggestive contours. In: ACM-MM, pp. 193–200. ACM, New York (2010)

  3. Fischler, M., Bolles, R.: Random sample consensus: a paradigm for model fitting with application to image analysis and automated cartography. Com. ACM 24, 381–395 (1981)

    Google Scholar 

  4. Hartley, R.I., Zisserman, A.: Multiple View Geometry in Computer Vision, 2nd edn. Cambridge University Press, Cambridge (2004). ISBN: 0521540518

  5. Choi, S., Kim, T., Yu, W.: Performance evaluation of RANSAC family. In: BMVC, pp. 1–12 (2009)

  6. Capel, D.: An effective bail-out test for RANSAC consensus scoring. In: BMVC, pp. 629–638 (2005)

  7. Chum, O., Matas, J.: Matching with PROSAC-progressive sample consensus. In: CVPR, vol. 1, pp. 220–226 (2005)

  8. Matas, J., Chum, O.: Randomized RANSAC with sequential probability ratio test. In: ICCV,vol. 2, pp. 1727–1732. IEEE, New York (2005)

  9. Chin, T., Yu, J., Suter, D.: Accelerated hypothesis generation for multi-structure data via preference analysis. IEEE Trans. Pattern Anal. Mach. Intell. 34, 625–638 (2012)

    Google Scholar 

  10. Sattler, T., Leibe, B., Kobbelt, L.: SCRAMSAC: improving RANSAC’s efficiency with a spatial consistency filter. In: ICCV, pp. 2090–2097. IEEE, New York (2009)

  11. Botterill, T., Mills, S., Green, R.: Fast RANSAC hypothesis generation for essential matrix estimation. In: 2011 International Conference on Digital Image Computing Techniques and Applications (DICTA), pp. 561–566. IEEE, New York (2011)

  12. Raguram, R., Frahm, J., Pollefeys, M.: A comparative analysis of RANSAC techniques leading to adaptive real-time random sample consensus. In: ECCV, pp. 500–513. (2008)

  13. Scaramuzza, D.: Performance evaluation of 1-point-RANSAC visual odometry. JFR 28, 792–811 (2011)

    Google Scholar 

  14. Frahm, J., Pollefeys, M.: RANSAC for (quasi-) degenerate data (QDEGSAC). In: CVPR, vol. 1, pp. 453–460. IEEE, New York (2006)

  15. Torr, P., Zisserman, A.: MLESAC: a new robust estimator with application to estimating image geometry. CVIU 78, 138–156 (2000)

    Google Scholar 

  16. Tran, Q.H., Chin, T.J., Carneiro, G., Brown, M., Suter, D.: In defence of RANSAC for outlier rejection in deformable registration. In: ECCV, pp. 274–287 (2012)

  17. Yan, Q., Xu, Y., Yang, X.: A robust homography estimation method based on keypoint consensus and appearance similarity. In: ICME, pp. 586–591. IEEE, New York (2012)

  18. Nishida, K., Kurita, T.: RANSAC-SVM for large-scale datasets. In: ICPR, pp. 1–4. IEEE, New York (2008)

  19. Bozkurt, E., Erzin, E., Erdem, Ç., Erdem, A.: RANSAC-based training data selection for speaker state recognition. In: InterSpeech. (2011)

  20. Nishida, K., Fujiki, J., Kurita, T.: Multiple random subset-kernel learning. In: CAIP, pp. 343–350. Springer, Berlin (2011)

  21. Ukrainitz, Y., Irani, M.: Aligning sequences and actions by maximizing space-time correlations. In: ECCV, pp. 538–550 (2006)

  22. Aanæs, H., Dahl, A., Steenstrup Pedersen, K.: Interesting interest points. IJCV 97(1), 18–35 (2011)

  23. Chandrasekhar, V., Takacs, G., Chen, D., Tsai, S., Reznik, Y., Grzeszczuk, R., Girod, B.: Compressed histogram of gradients: a low-bitrate descriptor. IJCV 96(3), 384–399 (2012)

    Google Scholar 

  24. Mikolajczyk, K., Schmid, C.: A performance evaluation of local descriptors. TPAMI 27, 1615–1630 (2005)

    Article  Google Scholar 

  25. Arie-Nachimson, M., Basri, R.: Constructing implicit 3D shape models for pose estimation. In: ICCV, pp. 1341–1348 (2009)

  26. Glasner, D., Galun, M., Alpert, S., Basri, R., Shakhnarovich, G.: Viewpoint-aware object detection and pose estimation. In: ICCV, pp. 1275–1282. IEEE, New York (2011)

  27. Su, H., Sun, M., Fei-Fei, L., Savarese, S.: Learning a dense multi-view representation for detection, viewpoint classification and synthesis of object categories. In: ICCV, pp. 213–220. IEEE, New York (2009)

  28. Prisacariu, V., Reid, I.: PWP3D: Real-time segmentation and tracking of 3D objects. In: BMVC. (2009)

  29. Sandhu, R., Dambreville, S., Yezzi, A., Tannenbaum, A.: Non-rigid 2D–3D pose estimation and 2D image segmentation. In: CVPR, pp. 786–793 (2009)

  30. Wu, C., Clipp, B., Li, X., Frahm, J., Pollefeys, M.: 3D model matching with viewpoint-invariant patches (VIP). In: CVPR, pp. 1–8 (2008)

  31. Gall, J., Rosenhahn, B., Seidel, H.: Robust pose estimation with 3D textured models. In: Advances in Image and Video Technology, Lecture Notes in Computer Science, vol. 4319, pp. 84–95 (2006)

  32. Hassner, T., Basri, R.: Example based 3D reconstruction from single 2D images. In: Beyond Patches Workshop at CVPR. (2006)

  33. Hassner, T., Basri, R.: Single view depth estimation from examples. CoRR abs/1304.3915 (2013)

  34. Hassner, T.: Viewing real-world faces in 3D. In: ICCV (2013)

  35. Stark, M., Goesele, M., Schiele, B.: Back to the future: learning shape models from 3D CAD data. In: BMVC, pp. 106.1–106.11 (2010)

  36. Liebelt, J., Schmid, C.: Multi-view object class detection with a 3D geometric model. In: CVPR, pp. 1688–1695 (2010)

  37. Liebelt, J., Schmid, C., Schertler, K.: Viewpoint-independent object class detection using 3D feature maps. In: CVPR, pp. 1–8 (2008)

  38. Fisher, S.: Statistical methods for research workers, vol. 5. Genesis Publishing Pvt Ltd, Traverse City (1932)

  39. Whitlock, M.: Combining probability from independent tests: the weighted \(z\)-method is superior to Fisher’s approach. J. Evol. Biol. 18, 1368–1373 (2005)

    Article  Google Scholar 

  40. Belhumeur, P., Hespanha, J., Kriegman, D.: Eigenfaces vs. Fisherfaces: recognition using class specific linear projection. TPAMI 19, 711–720 (1997)

    Article  Google Scholar 

  41. Lowe, D.: Distinctive image features from scale-invariant keypoints. IJCV 60, 91–110 (2004)

    Article  Google Scholar 

  42. Mikolajcyk, K., Schmid, C.: Scale and affine invariant interest point detectors. IJCV 60, 63–86 (2004). http://www.robots.ox.ac.uk/~vgg/research/affine/

    Google Scholar 

  43. Hassner, T., Mayzels, V., Zelnik-Manor, L.: On sifts and their scales. In: CVPR, pp. 1522–1528. IEEE, New York (2012)

  44. Van Kaick, O., Tagliasacchi, A., Sidi, O., Zhang, H., Cohen-Or, D., Wolf, L., Hamarneh, G.: Prior knowledge for part correspondence. Comput. Graph. Forum 30, 553–562 (2011)

    Article  Google Scholar 

  45. Gu, H.Z., Lee, S.Y.: Car model recognition by utilizing symmetric property to overcome severe pose variation. MVA 24(2), 255–274 (2012)

    Google Scholar 

  46. Hu, W.: Learning 3D object templates by hierarchical quantization of geometry and appearance spaces. In: CVPR, pp. 2336–2343. IEEE, New York (2012)

  47. Xiang, Y., Savarese, S.: Estimating the aspect layout of object categories. In: CVPR, pp. 3410–3417. IEEE, New York (2012)

  48. Vedaldi, A., Fulkerson, B.: VLFeat: An Open and Portable Library of Computer Vision Algorithms. http://www.vlfeat.org/ (2008). Accessed 1 Nov 2012

  49. Savarese, S., Fei-Fei, L.: 3D generic object categorization, localization and pose estimation. In: ICCV, pp. 1–8 (2007)

  50. Everingham, M., Gool, L.V., Williams, C., Winn, J., Zisserman, A.: The PASCAL Visual Object Classes Challenge 2007 (VOC2007) Results. http://www.pascal-network.org/challenges/VOC/voc2007 (2007). Accessed 1 Nov 2012

  51. Lin, W.Y., Liu, L., Matsushita, Y., Low, K.L., Liu, S.: Aligning images in the wild. In: CVPR, pp. 1–8. IEEE, New York (2012)

Download references

Acknowledgments

TH was partially funded by General Motors (GM).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Tal Hassner.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Hassner, T., Assif, L. & Wolf, L. When standard RANSAC is not enough: cross-media visual matching with hypothesis relevancy. Machine Vision and Applications 25, 971–983 (2014). https://doi.org/10.1007/s00138-013-0571-4

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00138-013-0571-4

Keywords

Navigation