Skip to main content

Wide-Baseline Image Matching with Projective View Synthesis and Calibrated Geometric Verification


Image matching is a fundamental task in photogrammetry and computer vision. While effective solutions exist for narrow-baseline viewing conditions, using detectors, e.g., based on differences of Gaussians (DoG) and descriptors such as scale-invariant feature transform (SIFT), it still remains a challenging problem for wide-baseline configurations. This is particularly true when dealing with UAV-based (unmanned aerial vehicle) images together with images taken from the ground. In this paper, we propose a method for wide-baseline image matching that extends the current state-of-the-art approach matching on demand with view synthesis (MODS) in such a way that even more extreme wide-baseline problems can be solved. We achieve this (1) by making use of projective transformations during view synthesis to overcome limitations induced by the approximate character of affine transformations and (2) by estimating the essential matrix within geometric verification to more robustly filter incorrect correspondences in case of a known camera calibration. We have evaluated our approach on several challenging image pairs mainly consisting of UAV-based images together with images taken from the ground and demonstrate improved performance compared to MODS.


Bildzuordnung bei großer Basis mit projektiver Ansichtssynthese und kalibrierter geometrischer Verifikation. Bildzuordnung ist eine grundlegende Aufgabe in Photogrammetrie und Computer Vision. Während für Aufnahmebedingungen mit kleiner Basis wirksame Lösungen existieren, die Detektoren bspw. basierend auf Differenzen von Gauß-Funktionen (DoG) und Deskriptoren wie Scale-Invariant Feature Transform (SIFT) nutzen, bleibt diese Aufgabe für Konfigurationen mit großer Basis nach wie vor eine Herausforderung. Dies gilt insbesondere, wenn man sich mit UAV-basierten (Unmanned Aerial Vehicle) Bildern zusammen mit Bildern, die vom Boden aus aufgenommen wurden, beschäftigt. In diesem Beitrag schlagen wir eine Methode zur Bildzuordnung bei großer Basis vor, die den aktuellen State-of-the-Art-Ansatz Matching on Demand with View Synthesis (MODS) so erweitert, dass noch extremere Probleme mit großer Basis gelöst werden können. Wir erreichen dies (1) durch Verwendung von projektiven Transformationen während der Ansichtssynthese, um Einschränkungen zu überwinden, die durch den approximativen Charakter von affinen Transformationen verursacht werden, und (2) durch Schätzung der essentiellen Matrix innerhalb der geometrischen Verifikation, um bei bekannter Kamerakalibrierung falsche Korrespondenzen robuster zu filtern. Wir haben unseren Ansatz auf mehreren Bildpaaren mit extrem unterschiedlichen Blickrichtungen evaluiert, welche hauptsächlich aus jeweils einem UAV-basierten Bild und einem Bild, das vom Boden aus aufgenommen wurde, bestehen, und demonstrieren eine verbesserte Leistungsfähigkeit unseres Verfahrens im Vergleich zu MODS.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7


  1. Arandjelović R, Zisserman A (2012) Three things everyone should know to improve object retrieval. In: 2012 IEEE conference on computer vision and pattern recognition (CVPR), pp 2911–2918. doi:10.1109/CVPR.2012.6248018

  2. Bay H, Ess A, Tuytelaars T, van Gool L (2008) Speeded-up robust features (SURF). Comp Vis Image Underst 110(3):346–359. doi:10.1016/j.cviu.2007.09.014

    Article  Google Scholar 

  3. Cai GR, Jodoin PM, Li SZ, Wu YD, Su SZ, Huang ZK (2013) Perspective-SIFT: an efficient tool for low-altitude remote sensing image registration. Signal Process 93(11):3088–3110. doi:10.1016/j.sigpro.2013.04.008

    Article  Google Scholar 

  4. Calonder M, Lepetit V, Strecha C, Fua P (2010) BRIEF: binary robust independent elementary features. In: Daniilidis K, Maragos P, Paragios N (eds) Computer Vision—ECCV 2010, Lecture Notes in Computer Science, vol 6314. Springer, pp 778–792. doi:10.1007/978-3-642-15561-1_56

  5. Chum O, Matas J (2005) Matching with PROSAC—progressive sample consensus. In: 2005 IEEE conference on computer vision and pattern recognition (CVPR), vol 1, pp 220–226. doi:10.1109/CVPR.2005.221

  6. Chum O, Matas J, Kittler J (2003) Locally optimized RANSAC. In: Goos G, Hartmanis J, van Leeuwen J, Michaelis B, Krell G (eds) Pattern Recognition, Lecture Notes in Computer Science, vol 2781. Springer, Berlin, pp 236–243. doi:10.1007/978-3-540-45243-0_31

  7. Fischler MA, Bolles RC (1981) Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. Commun ACM 24(6):381–395. doi:10.1145/358669.358692

    Article  Google Scholar 

  8. Hartley R (1997) In defense of the eight-point algorithm. IEEE Trans Pattern Anal Mach Intell 19(6):580–593. doi:10.1109/34.601246

    Article  Google Scholar 

  9. Hartmann W, Havlena M, Schindler K (2016) Recent developments in large-scale tie-point matching. ISPRS J Photogramm Remote Sens 115:47–62. doi:10.1016/j.isprsjprs.2015.09.005

  10. Heckbert P (1986) Survey of texture mapping. IEEE Comput Graph Appl 6(11):56–67. doi:10.1109/MCG.1986.276672

    Article  Google Scholar 

  11. Lebeda K, Matas J, Chum O (2012) Fixing the locally optimized RANSAC. In: Bowden R, Collomosse J, Mikolajczyk K (eds) British machine vision conference 2012, pp 95.1–95.11. doi:10.5244/C.26.95

  12. Lenc K, Matas J, Mishkin D (2014) A few things one should know about feature extraction, description and matching. In: Kúkelová Z, Heller J (eds) CVWW 2014. Czech Society for Cybernetics and Informatics, Czech Pattern Recognition Society Group, Prague, pp 67–74

  13. Liu W, Wang Y, Chen J, Guo J, Lu Y (2012) A completely affine invariant image-matching method based on perspective projection. Mach Vis Appl 23(2):231–242. doi:10.1007/s00138-011-0347-7

    Article  Google Scholar 

  14. Lowe DG (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vis 60(2):91–110. doi:10.1023/B:VISI.0000029664.99615.94

    Article  Google Scholar 

  15. Matas J, Chum O, Urban M, Pajdla T (2002) Robust wide baseline stereo from maximally stable extremal regions. In: Marshall D, Rosin PL (eds) British machine vision conference 2002, pp 36.1–36.10. doi:10.5244/C.16.36

  16. Mayer H, Bartelsen J, Hirschmüller H, Kuhn A (2012) Dense 3D reconstruction from wide baseline image sets. In: Dellaert F, Frahm JM, Pollefeys M, Leal-Taixé L, Rosenhahn B (eds) Outdoor and large-scale real-world scene analysis, Lecture Notes in Computer Science, vol 7474. Springer, Berlin, pp 285–304. doi:10.1007/978-3-642-34091-8_13

  17. Mikolajczyk K, Schmid C (2002) An affine invariant interest point detector. In: Heyden A, Sparr G, Nielsen M, Johansen P (eds) Computer Vision—ECCV 2002, Lecture Notes in Computer Science, vol 2350. Springer, Berlin, pp 128–142. doi:10.1007/3-540-47969-4_9

  18. Mikolajczyk K, Tuytelaars T, Schmid C, Zisserman A, Matas J, Schaffalitzky F, Kadir T, van Gool L (2005) A comparison of affine region detectors. Int J Comput Vis 65(1–2):43–72. doi:10.1007/s11263-005-3848-x

    Article  Google Scholar 

  19. Mishkin D, Matas J, Perdoch M (2015) MODS: fast and robust method for two-view matching. Comput Vis Image Underst 141:81–93. doi:10.1016/j.cviu.2015.08.005

    Article  Google Scholar 

  20. Moisan L, Stival B (2004) A probabilistic criterion to detect rigid point matches between two images and estimate the fundamental matrix. Int J Comput Vis 57(3):201–218. doi:10.1023/B:VISI.0000013094.38752.54

    Article  Google Scholar 

  21. Moreels P, Perona P (2007) Evaluation of features detectors and descriptors based on 3D objects. Int J Comput Vis 73(3):263–284. doi:10.1007/s11263-006-9967-1

    Article  Google Scholar 

  22. Morel JM, Yu G (2009) ASIFT: a new framework for fully affine invariant image comparison. SIAM J Imaging Sci 2(2):438–469. doi:10.1137/080732730

  23. Nistér D (2004) An efficient solution to the five-point relative pose problem. IEEE Trans Pattern Anal Mach Intell 26(6):756–770. doi:10.1109/TPAMI.2004.17

    Article  Google Scholar 

  24. Rosten E, Drummond T (2006) Machine learning for high-speed corner detection. In: Leonardis A, Bischof H, Pinz A (eds) Computer Vision—ECCV 2006, Lecture Notes in Computer Science, vol 3951. Springer, Berlin, pp 430–443. doi:10.1007/11744023_34

  25. Szeliski R (2011) Computer vision: algorithms and applications. Springer, London. doi:10.1007/978-1-84882-935-0

  26. Torr P, Zisserman A (2000) MLESAC: a new robust estimator with application to estimating image geometry. Comput Vis Image Underst 78(1):138–156. doi:10.1006/cviu.1999.0832

    Article  Google Scholar 

Download references

Author information



Corresponding author

Correspondence to Lukas Roth.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Roth, L., Kuhn, A. & Mayer, H. Wide-Baseline Image Matching with Projective View Synthesis and Calibrated Geometric Verification. PFG 85, 85–95 (2017).

Download citation


  • Image matching
  • Wide-baseline image matching
  • Local feature detectors
  • Local feature descriptors


  • Bildzuordnung
  • Bildzuordnung bei großer Basis
  • Lokale Merkmalsdetektoren
  • Lokale Merkmalsdeskriptoren