Collaborative Multimodal Location Estimation of Consumer Media

  • Venkatesan EkambaramEmail author
  • Kannan Ramchandran
  • Jaeyoung Choi
  • Gerald Friedland


With the emergence of Web 2.0 and with GPS devices becoming ubiquitous and pervasive in our daily life, location-based services are rapidly gaining traction in the online world. The main driving force behind these services is the enabling of a very personalized experience. Social-media websites such as Flickr, YouTube, Twitter, etc., allow queries for results originating at a certain location. Likewise, the belief is that retro-fitting archives with location information will be attractive to many businesses, and will enable newer applications. The task of estimating the geo-coordinates of a media-recording goes by different names such as “geo-tagging”, “location estimation” or “placing”. Geo-tagging multimedia content has various applications. For example, geo-location services can be provided for media captured in environments without GPS, such as photos taken indoors on mobile phones. Vacation videos and photos can be better organized and presented to the user if they have geo-location information. With the explosive growth of available multimedia content on the Internet (200 million photos are uploaded to Facebook daily), there is a dire need for efficient organization and retrieval of multimedia content, which can be enabled by geo-tagging. Geo-location information further helps develop a better semantic understanding of multimedia content. These are some of the main motivations of the MediaEval Placing task [1, 2].


Graphical Model Gaussian Mixture Model Query Image Multimedia Content Test Video 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Mediaeval web site,
  2. 2.
    M. Larson, M. Soleymani, P. Serdyukov, S. Rudinac, C. Wartena, V. Murdock, G. Friedland, R. Ordelman, Gareth J.F. Jones, Automatic Tagging and Geo-Tagging in Video Collections and Communities, in ACM International Conference on Multimedia Retrieval (ICMR 2011), April 2011, p. to appearGoogle Scholar
  3. 3.
    G. Friedland, O. Vinyals, T. Darrell, Multimodal Location Estimation, in Proceedings of ACM Multimedia, 2010, pp. 1245–1251Google Scholar
  4. 4.
    G. Schindler, M. Brown, R. Szeliski, City-scale location recognition, in IEEE Conference on Computer Vision and Pattern Recognition, 2007, pp. 1–7Google Scholar
  5. 5.
    W. Zhang, J. Kosecka, Image based localization in urban environments, in 3D Data Processing, Visualization, and Transmission, 3rd Intl. Symposium on, 2006, pp. 33–40Google Scholar
  6. 6.
    J. Hays, A.A. Efros, IM2GPS: estimating geographic information from a single image, in IEEE Conference on Computer Vision and Pattern Recognition, 2008, pp. 1–8Google Scholar
  7. 7.
    J. Luo, D. Joshi, J. Yu, A. Gallagher, Geotagging in multimedia and computer vision-a survey. Multimed. Tools Appl. 51, 187–211 (2011)CrossRefGoogle Scholar
  8. 8.
    T. Rattenbury, M. Naaman, Methods for extracting place semantics from Flickr tags. ACM Trans. Web (TWEB) 3(1), 1–30 (2009)CrossRefGoogle Scholar
  9. 9.
    P. Serdyukov, V. Murdock, R. van Zwol, Placing Flickr photos on a map, in ACM SIGIR, 2009, pp. 484–491Google Scholar
  10. 10.
    O. Van Laere, S. Schockaert, B. Dhoedt, Ghent university at the 2011 placing task, in Proceedings of MediaEval, 2011Google Scholar
  11. 11.
    L. Cao, J. Yu, J. Luo, T. Huang, Enhancing semantic and geographic annotation of web images via logistic canonical correlation regression, in Proceedings of the 17th ACM International Conference on Multimedia, New York, NY, USA, 2009, MM ’09, pp. 125–134, ACMGoogle Scholar
  12. 12.
    A. Gallagher, D. Joshi, J. Yu, J. Luo, Geo-location inference from image content and user tags, in Proceedings of IEEE CVPR. 2009, IEEEGoogle Scholar
  13. 13.
    David J. Crandall, Lars Backstrom, Daniel Huttenlocher, Jon Kleinberg, Mapping the world’s photos, in Proceedings of WWW ’09, New York, NY, USA, 2009, pp. 761–770, ACMGoogle Scholar
  14. 14.
    P. Kelm, S. Schmiedeke, T. Sikora, A hierarchical, multi-modal approach for placing videos on the map using millions of flickr photographs, in Proceedings of SBNMA ’11, New York, NY, USA, 2011, pp. 15–20, ACMGoogle Scholar
  15. 15.
    G. Friedland, J. Choi, H. Lei, A. Janin, Multimodal Location Estimation on Flickr Videos, in Proceedings of the 2011 ACM Workshop on Social Media, Scottsdale, Arizona, USA, 2011, pp. 23–28, ACMGoogle Scholar
  16. 16.
    J. Choi, H. Lei, G. Friedland, The 2011 ICSI Video Location Estimation System, in Proceedings of MediaEval, 2011Google Scholar
  17. 17.
    M.J. Wainwright, M.I. Jordan, Graphical models, exponential families, and variational inference. Found. Trends Mach. Learn. 1, 1–305 (2008)CrossRefzbMATHGoogle Scholar
  18. 18.
    Le Song, Arthur Gretton, Danny Bickson, Yucheng Low, Carlos Guestrin, Kernel belief propagation, in International Conference on Artificial Intelligence and Statistics, 2011, pp. 707–715Google Scholar
  19. 19.
    N. Vlassis, A. Likas, A greedy em algorithm for gaussian mixture learning. Neural Process. Lett 15(1), 77–87 (2002)CrossRefzbMATHGoogle Scholar
  20. 20.
    A. Rae, V. Murdock, P. Serdyukov, Working Notes for the Placing Task at MediaEval 2011, in MediaEval 2011 Workshop (Pisa, Italy, September 2011)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  • Venkatesan Ekambaram
    • 1
    Email author
  • Kannan Ramchandran
    • 1
  • Jaeyoung Choi
    • 2
  • Gerald Friedland
    • 2
  1. 1.University of CaliforniaBerkeleyUSA
  2. 2.International Computer Science InstituteBerkeleyUSA

Personalised recommendations