Large-Scale Visual Geo-Localization

Part of the series Advances in Computer Vision and Pattern Recognition pp 21-40


Discovering Mid-level Visual Connections in Space and Time

  • Yong Jae LeeAffiliated withDepartment of Computer Science, UC Davis Email author 
  • , Alexei A. EfrosAffiliated withDepartment of Electrical Engineering and Computer Science, UC Berkeley
  • , Martial HebertAffiliated withRobotics Institute, Carnegie Mellon University

* Final gross prices may vary according to local VAT.

Get Access


Finding recurring visual patterns in data underlies much of modern computer vision. The emerging subfield of visual category discovery/visual data mining proposes to cluster visual patterns that capture more complex appearance than low-level blobs, corners, or oriented bars, without requiring any semantic labels. In particular, mid-level visual elements have recently been proposed as a new type of visual primitive, and have been shown to be useful for various recognition tasks. The visual elements are discovered automatically from the data, and thus, have a flexible representation of being either a part, an object, a group of objects, etc. In this chapter, we explore what the mid-level visual representation brings to geo-spatial and longitudinal analyses. Specifically, we present a weakly supervised visual data mining approach that discovers connections between recurring mid-level visual elements in historic (temporal) and geographic (spatial) image collections, and attempts to capture the underlying visual style. In contrast to existing discovery methods that mine for patterns that remain visually consistent throughout the dataset, the goal is to discover visual elements whose appearance changes due to change in time or location, i.e., exhibit consistent stylistic variations across the label space (date or geo-location). To discover these elements, we first identify groups of patches that are style-sensitive. We then incrementally build correspondences to find the same element across the entire dataset. Finally, we train style-aware regressors that model each element’s range of stylistic differences. We apply our approach to date and geo-location prediction and show substantial improvement over several baselines that do not model visual style. We also demonstrate the method’s effectiveness on the related task of fine-grained classification.