Skip to main content

The Benchmark as a Research Catalyst: Charting the Progress of Geo-prediction for Social Multimedia

  • Chapter
  • First Online:
Multimodal Location Estimation of Videos and Images

Abstract

Benchmarks have the power to bring research communities together to focus on specific research challenges. They drive research forward by making it easier to systematically compare and contrast new solutions, and evaluate their performance with respect to the existing state of the art. In this chapter, we present a retrospective on the Placing Task, a yearly challenge offered by the MediaEval Multimedia Benchmark. The Placing Task, launched in 2010, is a benchmarking task that requires participants to develop algorithms that automatically predict the geolocation of social multimedia (videos and images). This chapter covers the editions of the Placing Task offered in 2010–2013, and also presents an outlook onto 2014. We present the formulation of the task and the task dataset for each year, tracing the design decisions that were made by the organizers, and how each year built on the previous year. Finally, we provide a summary of future directions and challenges for multimodal geolocation, and concluding remarks on how benchmarking has catalyzed research progress in the research area of geolocation prediction for social multimedia.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    http://multimediaeval.org.

  2. 2.

    http://multimediaeval.org.

  3. 3.

    http://www.multimediaeval.org.

  4. 4.

    http://trec.nist.gov.

  5. 5.

    http://research.nii.ac.jp/ntcir/.

  6. 6.

    http://www.clef-campaign.org.

  7. 7.

    http://research.microsoft.com/en-us/um/people/szeliski/visioncontest05/default.htm.

  8. 8.

    http://multimediaeval.org/mediaeval2010/placing.

  9. 9.

    http://www.geonames.org.

  10. 10.

    http://multimediaeval.org/mediaeval2011/placing2011/.

  11. 11.

    http://multimediaeval.org/mediaeval2012/placing2012.

  12. 12.

    http://multimediaeval.org/mediaeval2013/placing2013.

  13. 13.

    Available for download: http://www.st.ewi.tudelft.nl/~hauff/placingTask2013Data.html.

  14. 14.

    The baseline runs used out-of-the-box location prediction software: https://github.com/chauff/ImageLocationEstimation, with geographic filtering enabled.

  15. 15.

    http://multimediaeval.org/mediaeval2014/placing2014.

  16. 16.

    http://webscope.sandbox.yahoo.com/catalog.php?datatype=i&did=67.

  17. 17.

    http://www.icsi.berkeley.edu/icsi/.

  18. 18.

    https://www.llnl.gov.

  19. 19.

    http://www.gps.gov/systems/gps/performance/accuracy.

  20. 20.

    http://www.gps.gov/systems/augmentations.

  21. 21.

    https://www.youtube.com.

  22. 22.

    https://www.facebook.com.

  23. 23.

    https://vimeo.com.

  24. 24.

    http://blip.tv.

  25. 25.

    http://instagram.com.

  26. 26.

    https://vine.co.

  27. 27.

    http://www.nasa.gov/mission_pages/station.

References

  1. J. Almeida, N. Leite, R. Torres, Comparison of video sequences with histograms of motion patterns, in 18th IEEE International Conference on Image Processing (ICIP), September 2011, pp. 3673–3676

    Google Scholar 

  2. A. Badii, M. Einig, T. Piatrik, Overview of the MediaEval 2013 Visual Privacy Task, in Larson et al. [31]

    Google Scholar 

  3. J. Cao, Photo set refinement and tag segmentation in georeferencing Flickr photos, in Larson et al. [31]

    Google Scholar 

  4. J. Choi, V. Ekambaram, G. Friedland, K. Ramchandran, The 2012 ICSI/Berkeley video location estimation system, in Larson et al. [35]

    Google Scholar 

  5. J. Choi, G. Friedland, Data-driven vs. semantic-technology-driven tag-based video location estimation, in Proceedings of the 2011 IEEE Fifth International Conference on Semantic Computing, ICSC ’11. IEEE Computer Society, Washington, DC, pp. 243–246 (2011)

    Google Scholar 

  6. J. Choi, G. Friedland, V. Ekambaram, K. Ramchandran, Multimodal location estimation of consumer media: dealing with sparse training data, in Proceedings of the 2012 IEEE International Conference on Multimedia and Expo, ICME ’12. IEEE Computer Society, Washington, DC, pp. 43–48 (2012)

    Google Scholar 

  7. J. Choi, A. Janin, G. Friedland, The 2010 ICSI video location estimation system, in Larson et al. [33]

    Google Scholar 

  8. J. Choi, H. Lei, V. Ekambaram, P. Kelm, L. Gottlieb, T. Sikora, K. Ramchandran, G. Friedland, Human versus machine: establishing a human baseline for multimodal location estimation, in Proceedings of the 21st ACM International Conference on Multimedia, MM ’13, ACM, New York, pp. 867–876 (2013)

    Google Scholar 

  9. J. Choi, H. Lei, G. Friedland, The 2011 ICSI video location estimation system, in Larson et al. [32]

    Google Scholar 

  10. D.J. Crandall, L. Backstrom, D. Huttenlocher, J. Kleinberg, Mapping the world’s photos, in Proceedings of the 18th International Conference on World Wide Web, WWW ’09, ACM, 2009, pp. 761–770

    Google Scholar 

  11. J. Davies, J. Hare, S. Samangooei, J. Preston, N. Jain, D. Dupplaw, P. Lewis, Identifying the geographic location of an image with a multimodal probability density function, in Larson et al. [31]

    Google Scholar 

  12. D. Ferrès, H. Rodríguez, TALP at MediaEval 2010 Placing Task: geographical focus detection of Flickr textual annotations, in Larson et al. [33]

    Google Scholar 

  13. D. Ferres, H. Rodriguez, TALP at MediaEval 2011 Placing Task: georeferencing Flickr videos with geographical knowledge and information retrieval, in Larson et al. [32]

    Google Scholar 

  14. G. Friedland, J. Choi, Semantic computing and privacy: a case study using inferred geo-location. Int. J. Semant. Comput. 5(1), 79–93 (2011)

    Article  Google Scholar 

  15. G. Friedland, J. Choi, A. Janin, VIDEO2GPS: a demo of multimodal location estimation on Flickr videos, in Proceedings of the 19th ACM International Conference on Multimedia, MM ’11, ACM, New York, pp. 833–834 (2011)

    Google Scholar 

  16. A. Gallagher, D. Joshi, J. Yu, J. Luo, Geo-location inference from image content and user tags, in IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, 2009, CVPR Workshops 2009, June 2009, pp. 55–62

    Google Scholar 

  17. C. Hauff, A study on the accuracy of Flickr’s geotag data, in Proceedings of the 36th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’13, ACM, New York, pp. 1037–1040 (2013)

    Google Scholar 

  18. C. Hauff, G.-J. Houben, WISTUD at MediaEval 2011: placing task, in Larson et al. [32]

    Google Scholar 

  19. C. Hauff, G.-J. Houben, Geo-location estimation of Flickr images: social web based enrichment, in Proceedings of the 34th European Conference on Advances in Information Retrieval, ECIR’12. Springer, Berlin, pp. 85–96 (2012)

    Google Scholar 

  20. C. Hauff, G.-J. Houben, Placing images on the world map: a microblog-based enrichment approach, in Proceedings of the 35th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’12, ACM, New York, pp. 691–700 (2012)

    Google Scholar 

  21. C. Hauff, B. Thomee, M. Trevisiol, Working notes for the placing task at MediaEval 2013, in Larson et al. [31]

    Google Scholar 

  22. J. Hays, A.A. Efros, Im2gps: estimating geographic information from a single image, in CVPR. IEEE Computer Society (2008)

    Google Scholar 

  23. J.M. Perea-Ortega, M.Á. García-Cumbreras, L. Alfonso Ureña-López, M. García-Vega, SINAI at Placing Task of MediaEval 2010, in Larson et al. [33]

    Google Scholar 

  24. P. Kelm, S. Schmiedeke, T. Sikora, VIDEO2GPS: geotagging using collaborative systems, textual and visual features: MediaEval 2010 Placing Task, in Larson et al. [33]

    Google Scholar 

  25. P. Kelm, S. Schmiedeke, T. Sikora, A hierarchical, multi-modal approach for placing videos on the map using millions of Flickr photographs, in ACM Multimedia 2011 (Workshop on Social and Behavioral Networked Media Access—SBNMA), ACM, November 2011

    Google Scholar 

  26. P. Kelm, S. Schmiedeke, T. Sikora, Multi-modal, multi-resource methods for placing Flickr videos on the map, in Proceedings of the 1st ACM International Conference on Multimedia Retrieval, ICMR ’11, ACM, New York, pp. 52:1–52:8 (2011)

    Google Scholar 

  27. P. Kelm, S. Schmiedeke, T. Sikora, How spatial segmentation improves the multimodal geo-tagging, in Larson et al. [35]

    Google Scholar 

  28. G. Kordopatis-Zilos, S. Papadopoulos, E. Spyromitros-Xioufis, A.L. Symeonidis, Y. Kompatsiaris, CERTH at MediaEval Placing Task 2013, in Larson et al. [31]

    Google Scholar 

  29. F. Krippner, G. Meier, J. Hartmann, R. Knauf, Placing media items using the XTrieval framework, in Larson et al. [32]

    Google Scholar 

  30. O.V. Laere, S. Schockaert, V. Tanasescu, B. Dhoedt, C. Jones, Georeferencing Wikipedia documents using data from social media. ACM Trans. Inf. Syst. 32(3), (2014)

    Google Scholar 

  31. M. Larson, X. Anguera, T. Reuter, G.J.F. Jones, B. Ionescu, M. Schedl, T. Piatrik, C. Hauff, M. Soleymani (eds.), in Proceedings of the MediaEval 2013 Multimedia Benchmark Workshop, Barcelona, Spain, October 2013, CEUR-WS.org, online http://ceur-ws.org/Vol-1043 (2013)

  32. M. Larson, A. Rae, C.-H. Demarty, C. Kofler, F. Metze, R. Troncy, V. Mezaris, G.J.F. Jones (eds.), in Working Notes Proceedings of the MediaEval 2011 Workshop, Pisa, Italy, September 2011, CEUR-WS.org, online http://ceur-ws.org/Vol-807 (2011)

  33. M. Larson, M. Soleymani, P. Serdyukov, V. Murdock, G.J.F. Jones (eds.), in Working Notes Proceedings of the MediaEval 2010 Workshop, Pisa, Italy, October 2010, online http://multimediaeval.org/mediaeval2010/2010worknotes (2010)

  34. M. Larson, M. Soleymani, P. Serdyukov, S. Rudinac, C. Wartena, V. Murdock, G. Friedland, R. Ordelman, G.J.F. Jones, Automatic tagging and geotagging in video collections and communities, in Proceedings of the 1st ACM International Conference on Multimedia Retrieval, ICMR ’11, ACM, New York, pp. 51:1–51:8 (2011)

    Google Scholar 

  35. M. Larson, S. Schmiedeke, P. Kelm, A. Rae, V. Mezaris, T. Piatrik, M. Soleymani, F. Metze, G.J.F. Jones (eds.), in Working Notes Proceedings of the MediaEval 2012 Workshop, Pisa, Italy, October 2012, CEUR-WS.org, online http://ceur-ws.org/Vol-927 (2012)

  36. M. Larson, M. Soleymani, M. Eskevich, P. Serdyukov, R. Ordelman, G. Jones, The Community and the Crowd: Multimedia Benchmark Dataset Development. MultiMedia, IEEE. 19(3), 15–23 (2012)

    Google Scholar 

  37. H. Lei, J. Choi, G. Friedland, Multimodal city-verification on Flickr videos using acoustic and textual features, in 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), March 2012, pp. 2273–2276

    Google Scholar 

  38. L. Li, D. Pedronette, J. Almeida, O. Penatti, R. Calumby, R. Torres, A rank aggregation framework for video multimodal geocoding, pp. 1–37 (2013)

    Google Scholar 

  39. L.T. Li, J. Almeida, R.D.S. Torres, RECOD working notes for placing task MediaEval 2011, in Larson et al. [32]

    Google Scholar 

  40. L.T. Li, J. Almeida, D.C.G Pedronette, O. Penatti, R.D.S. Torres, A multimodal approach for video geocoding, in Larson et al. [35]

    Google Scholar 

  41. L.T. Li, J. Almeida, O. Penatti, R. Calumby, D.C.G. Pedronette, M.A. Gonçalves, R.D.S. Torres, Multimodal image geocoding: the 2013 RECOD’s approach, in Larson et al. [31]

    Google Scholar 

  42. X. Li, C. Hauff, M.A. Larson, A. Hanjalic, Preliminary exploration of the use of geographical information for content-based geo-tagging of social video, in Larson et al. [35]

    Google Scholar 

  43. X. Li, M. Riegler, M. Larson, A. Hanjalic, Exploration of feature combination in geo-visual ranking for visual content-based location prediction, in Larson et al. [31]

    Google Scholar 

  44. N. O’Hare, V. Murdock, Modeling locations with social media. Inf. Retr. 16(1), 30–62 (2013)

    Article  Google Scholar 

  45. J. Oomen, P. Over, W. Kraaij, A. Smeaton, Symbiosis between the TrecVid benchmark and video libraries at the Netherlands Institute for Sound and Vision. Int. J. Digit. Libr. 13(2), 91–104 (2013)

    Google Scholar 

  46. O.A.B. Penatti, L.T. Li, J. Almeida, R.D.S. Torres, A visual approach for video geocoding using bag-of-scenes, in Proceedings of the 2Nd ACM International Conference on Multimedia Retrieval, ICMR ’12, ACM, New York, pp. 53:1–53:8 (2012)

    Google Scholar 

  47. A. Popescu, CEA List’s participation at MediaEval 2013 Placing Task, in Larson et al. [31]

    Google Scholar 

  48. A. Popescu, N. Ballas, CEA List’s participation at MediaEval 2012 Placing Task, in Larson et al. [35]

    Google Scholar 

  49. A. Rae, P. Kelm, Working notes for the Placing Task at MediaEval 2012, in Larson et al. [35]

    Google Scholar 

  50. A. Rae, V. Murdock, P. Serdyukov, P. Kelm, Working notes for the Placing Task at MediaEval 2011, in Larson et al. [32]

    Google Scholar 

  51. S. Schmiedeke, C. Kofler, I. Ferrané, Overview of the MediaEval 2012 Tagging Task, Working Notes Proceedings of the MediaEval 2012 Workshop, Santa Croce in Fossabanda, Pisa, Italy, October 4–5, CEUR-WS.org, ISSN 1613–0073 (2012)

    Google Scholar 

  52. P. Serdyukov, V. Murdock, R. van Zwol, Placing Flickr photos on a map, in Proceedings of the 32Nd International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’09, ACM, New York, pp. 484–491 (2009)

    Google Scholar 

  53. D.A. Shamma, One hundred million creative commons Flickr images for research. http://yahoolabs.tumblr.com/post/89783581601/one-hundred-million-creative-commons-flickr-images-for, month = June, note = Accessed: 30 June 2014 (2014)

  54. A.F. Smeaton, P. Over, W. Kraaij, Evaluation campaigns and TrecVid, in Proceedings of the 8th ACM International Workshop on Multimedia Information Retrieval, MIR ’06, ACM, New York, pp. 321–330 (2006)

    Google Scholar 

  55. S. Subramanian, V. Vidyasagaran, K. Chandramouli, VIT@MediaEval 2013 Placing Task: location specific tag weighting for language model based placing of images, in Larson et al. [31]

    Google Scholar 

  56. M. Trevisiol, J. Delhumeau, H. Jégou, G. Gravier, How INRIA/IRISA identifies geographic location of a video, in Larson et al. [35]

    Google Scholar 

  57. M. Trevisiol, H. Jégou, J. Delhumeau, G. Gravier, Retrieving geo-location of videos with a divide & conquer hierarchical multimodal approach, in Proceedings of the 3rd ACM Conference on International Conference on Multimedia Retrieval, ICMR ’13, ACM, New York, pp. 1–8 (2013)

    Google Scholar 

  58. O. Van Laere, S. Schockaert, B. Dhoedt, Ghent University at the 2010 Placing Task, in Larson et al. [33]

    Google Scholar 

  59. O. Van Laere, S. Schockaert, B. Dhoedt, Finding locations of Flickr resources using language models and similarity search, in Proceedings of the 1st ACM International Conference on Multimedia Retrieval, ICMR ’11, ACM, New York, pp. 48:1–48:8 (2011)

    Google Scholar 

  60. O. Van Laere, S. Schockaert, B. Dhoedt, Ghent University at the 2011 Placing Task, in Larson et al. [32]

    Google Scholar 

  61. O. Van Laere, S. Schockaert, B. Dhoedt, Georeferencing Flickr photos using language models at different levels of granularity: an evidence based approach. J. Web Semant. 16, 17–31 (2012)

    Google Scholar 

  62. O. Van Laere, S. Schockaert, B. Dhoedt, Georeferencing Flickr resources based on textual meta-data. Inf. Sci. 238, 52–74 (2013)

    Google Scholar 

  63. O. Van Laere, S. Schockaert, J. Quinn, F. Langbein, B. Dhoedt, Ghent and CARDIFF University at the 2012 Placing Task, in Larson et al. [35]

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Martha Larson .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this chapter

Cite this chapter

Larson, M. et al. (2015). The Benchmark as a Research Catalyst: Charting the Progress of Geo-prediction for Social Multimedia. In: Choi, J., Friedland, G. (eds) Multimodal Location Estimation of Videos and Images. Springer, Cham. https://doi.org/10.1007/978-3-319-09861-6_2

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-09861-6_2

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-09860-9

  • Online ISBN: 978-3-319-09861-6

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics