Discovering Mid-level Visual Connections in Space and Time

  • Yong Jae Lee
  • Alexei A. Efros
  • Martial Hebert
Part of the Advances in Computer Vision and Pattern Recognition book series (ACVPR)


Finding recurring visual patterns in data underlies much of modern computer vision. The emerging subfield of visual category discovery/visual data mining proposes to cluster visual patterns that capture more complex appearance than low-level blobs, corners, or oriented bars, without requiring any semantic labels. In particular, mid-level visual elements have recently been proposed as a new type of visual primitive, and have been shown to be useful for various recognition tasks. The visual elements are discovered automatically from the data, and thus, have a flexible representation of being either a part, an object, a group of objects, etc. In this chapter, we explore what the mid-level visual representation brings to geo-spatial and longitudinal analyses. Specifically, we present a weakly supervised visual data mining approach that discovers connections between recurring mid-level visual elements in historic (temporal) and geographic (spatial) image collections, and attempts to capture the underlying visual style. In contrast to existing discovery methods that mine for patterns that remain visually consistent throughout the dataset, the goal is to discover visual elements whose appearance changes due to change in time or location, i.e., exhibit consistent stylistic variations across the label space (date or geo-location). To discover these elements, we first identify groups of patches that are style-sensitive. We then incrementally build correspondences to find the same element across the entire dataset. Finally, we train style-aware regressors that model each element’s range of stylistic differences. We apply our approach to date and geo-location prediction and show substantial improvement over several baselines that do not model visual style. We also demonstrate the method’s effectiveness on the related task of fine-grained classification.


Visual Pattern Generic Detector Visual Element Spatial Pyramid Detection Score 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.



We thank Olivier Duchenne for helpful discussions. This work was supported in part by Google, ONR MURI N000141010934, and the Intelligence Advanced Research Projects Activity (IARPA) via Air Force Research Laboratory. The U.S. government is authorized to reproduce and distribute reprints for governmental purposes notwithstanding any copyright annotation thereon. Disclaimer: The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the official policies or endorsements, either expressed or implied, of IARPA, AFRL or the U.S. government.


  1. 1.
    Berg T, Belhumeur P (2013) POOF: part-based one-vs-one features for fine-grained categorization, face verification, and attribute estimation. In: CVPRGoogle Scholar
  2. 2.
    Chai Y, Lempitsky V, Zisserman A (2013) Symbiotic segmentation and part localization for fine-grained categorization. In: ICCVGoogle Scholar
  3. 3.
    Chen CY, Grauman K (2011) Clues from the beaten path: location estimation with bursty sequences of tourist photos. In: CVPRGoogle Scholar
  4. 4.
    Cristani M, Perina A, Castellani U, Murino V (2008) Geolocated image analysis using latent representations. In: CVPRGoogle Scholar
  5. 5.
    Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. In: CVPRGoogle Scholar
  6. 6.
    Doersch C, Singh S, Gupta A, Sivic J, Efros AA (2012) What makes Paris look like Paris? In: SIGGRAPHGoogle Scholar
  7. 7.
    Duan K, Parikh D, Crandall D, Grauman K (2012) Discovering localized attributes for fine-grained recognition. In: CVPRGoogle Scholar
  8. 8.
    Faktor A, Irani M (2012) Clustering by composition unsupervised discovery of image categories. In: ECCVGoogle Scholar
  9. 9.
    Farrell R, Oza O, Zhang N, Morariu V, Darrell T, Davis L (2011) Birdlets: subordinate categorization using volumetric primitives and pose-normalized appearance. In: ICCVGoogle Scholar
  10. 10.
    Fu Y, Guo G-D, Huang T (2010) Age synthesis and estimation via faces: a survey. TPAMIGoogle Scholar
  11. 11.
    Gavves E, Fernando B, Snoek C, Smeulders A, Tuytelaars T (2013) Fine-grained categorization by alignments. In: ICCVGoogle Scholar
  12. 12.
    Grauman K, Darrell T (2006) Unsupervised learning of categories from sets of partially matching image features. In: CVPRGoogle Scholar
  13. 13.
    Hays J, Efros A (2008) Im2gps: estimating geographic information from a single image. In: CVPRGoogle Scholar
  14. 14.
    Kalogerakis E, Vesselova O, Hays J, Efros A, Hertzmann A (2009) Image sequence geolocation with human travel priors. In: ICCVGoogle Scholar
  15. 15.
    Kim G, Xing E, Torralba A (2010) Modeling and analysis of dynamic behaviors of web image collections. In: ECCVGoogle Scholar
  16. 16.
    Knopp J, Sivic J, Pajdla T (2010) Avoiding confusing features in place recognition. In: ECCVGoogle Scholar
  17. 17.
    Lazebnik S, Schmid C, Ponce J (2006) Beyond bags of features: spatial pyramid matching for recognizing natural scene categories. In: CVPRGoogle Scholar
  18. 18.
    Lee YJ, Efros AA, Hebert M (2013) Style-aware mid-level representation for discovering visual connections in space and time. In: ICCVGoogle Scholar
  19. 19.
    Lee YJ, Grauman K (2009) Foreground focus: unsupervised learning from partially matching images. In: IJCV, vol 85Google Scholar
  20. 20.
    Lee YJ, Grauman K (2011) Object-graphs for context-aware visual category discovery. In: TPAMIGoogle Scholar
  21. 21.
    Li L-J, Su H, Xing E, Fei-Fei L (2010) Object bank: a high-level image representation for scene classification and semantic feature sparsification. In: NIPSGoogle Scholar
  22. 22.
    Malisiewicz T, Efros A (2009) Beyond categories: the visual memex model for reasoning about object relationships. In: NIPSGoogle Scholar
  23. 23.
    Palermo F, Hays J, Efros AA (2012) Dating historical color images. In: ECCVGoogle Scholar
  24. 24.
    Parikh D, Grauman K (2011) Relative attributes. In: ICCVGoogle Scholar
  25. 25.
    Payet N, Todorovic S (2010) From a set of shapes to object discovery. In: ECCVGoogle Scholar
  26. 26.
    Raptis M, Kokkinos I, Soatto S (2012) Discovering discriminative action parts from mid-level video representations. In: CVPRGoogle Scholar
  27. 27.
    Rastegariy M, Farhadi A, Forsyth D (2012) Attribute discovery via predictable discriminative binary codes. In: ECCVGoogle Scholar
  28. 28.
    Schindler G, Brown M, Szeliski R (2007) Cityscale location recognition. In CVPRGoogle Scholar
  29. 29.
    Shrivastava A, Singh S, Gupta A (2012) Constrained semi-supervised learning using attributes and comparative attributes. In: ECCVGoogle Scholar
  30. 30.
    Singh S, Gupta A, Efros AA (2012) Unsupervised discovery of mid-level discriminative patches. In: ECCVGoogle Scholar
  31. 31.
    Sivic J, Russell B, Efros A, Zisserman A, Freeman W (2005) Discovering object categories in image collections. In: ICCVGoogle Scholar
  32. 32.
    Sivic J, Zisserman A (2003) Video Google: a text retrieval approach to object matching in videos. In: ICCVGoogle Scholar
  33. 33.
    Smola A, Schlkopf B (2003) A tutorial on support vector regression. Technical report, Statistics and ComputingGoogle Scholar
  34. 34.
    Tenenbaum J, Freeman W (2000) Separating style and content with bilinear models. Neural Comput 12(6)Google Scholar
  35. 35.
    Torralba A, Efros AA (2011) Unbiased look at dataset bias. In: CVPRGoogle Scholar
  36. 36.
    Wah C, Branson S, Perona P, Belongie S (2011) Multiclass recognition part localization with humans in the loop. In: ICCVGoogle Scholar
  37. 37.
    Wah C, Branson S, Welinder P, Perona P, Belongie S (2011) The caltech-UCSD birds-200-2011 dataset. Technical reportGoogle Scholar
  38. 38.
    Yang S, Bo L, Wang J, Shapiro L (2012) Unsupervised template learning for fine-grained object recognition. In: NIPSGoogle Scholar
  39. 39.
    Yao B, Khosla A, Fei-Fei L (2011) Combining randomization and discrimination for fine-grained image categorization. In: CVPRGoogle Scholar
  40. 40.
    Zhang N, Farrell R, Darrell T (2012) Pose pooling kernels for sub-category recognition. In: CVPRGoogle Scholar
  41. 41.
    Zhang N, Farrell R, Iandola F, Darrell T (2013) Deformable part descriptors for fine-grained recognition and attribute prediction. In: ICCVGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2016

Authors and Affiliations

  • Yong Jae Lee
    • 1
  • Alexei A. Efros
    • 2
  • Martial Hebert
    • 3
  1. 1.Department of Computer ScienceUC DavisDavisUSA
  2. 2.Department of Electrical Engineering and Computer ScienceUC BerkeleyBerkeleyUSA
  3. 3.Robotics InstituteCarnegie Mellon UniversityPittsburghUSA

Personalised recommendations