The Benchmark as a Research Catalyst: Charting the Progress of Geo-prediction for Social Multimedia

  • Martha Larson
  • Pascal Kelm
  • Adam Rae
  • Claudia Hauff
  • Bart Thomee
  • Michele Trevisiol
  • Jaeyoung Choi
  • Olivier Van Laere
  • Steven Schockaert
  • Gareth J.F. Jones
  • Pavel Serdyukov
  • Vanessa Murdock
  • Gerald Friedland

Abstract

Benchmarks have the power to bring research communities together to focus on specific research challenges. They drive research forward by making it easier to systematically compare and contrast new solutions, and evaluate their performance with respect to the existing state of the art. In this chapter, we present a retrospective on the Placing Task, a yearly challenge offered by the MediaEval Multimedia Benchmark. The Placing Task, launched in 2010, is a benchmarking task that requires participants to develop algorithms that automatically predict the geolocation of social multimedia (videos and images). This chapter covers the editions of the Placing Task offered in 2010–2013, and also presents an outlook onto 2014. We present the formulation of the task and the task dataset for each year, tracing the design decisions that were made by the organizers, and how each year built on the previous year. Finally, we provide a summary of future directions and challenges for multimodal geolocation, and concluding remarks on how benchmarking has catalyzed research progress in the research area of geolocation prediction for social multimedia.

References

  1. 1.
    J. Almeida, N. Leite, R. Torres, Comparison of video sequences with histograms of motion patterns, in 18th IEEE International Conference on Image Processing (ICIP), September 2011, pp. 3673–3676Google Scholar
  2. 2.
    A. Badii, M. Einig, T. Piatrik, Overview of the MediaEval 2013 Visual Privacy Task, in Larson et al. [31]Google Scholar
  3. 3.
    J. Cao, Photo set refinement and tag segmentation in georeferencing Flickr photos, in Larson et al. [31]Google Scholar
  4. 4.
    J. Choi, V. Ekambaram, G. Friedland, K. Ramchandran, The 2012 ICSI/Berkeley video location estimation system, in Larson et al. [35]Google Scholar
  5. 5.
    J. Choi, G. Friedland, Data-driven vs. semantic-technology-driven tag-based video location estimation, in Proceedings of the 2011 IEEE Fifth International Conference on Semantic Computing, ICSC ’11. IEEE Computer Society, Washington, DC, pp. 243–246 (2011)Google Scholar
  6. 6.
    J. Choi, G. Friedland, V. Ekambaram, K. Ramchandran, Multimodal location estimation of consumer media: dealing with sparse training data, in Proceedings of the 2012 IEEE International Conference on Multimedia and Expo, ICME ’12. IEEE Computer Society, Washington, DC, pp. 43–48 (2012)Google Scholar
  7. 7.
    J. Choi, A. Janin, G. Friedland, The 2010 ICSI video location estimation system, in Larson et al. [33]Google Scholar
  8. 8.
    J. Choi, H. Lei, V. Ekambaram, P. Kelm, L. Gottlieb, T. Sikora, K. Ramchandran, G. Friedland, Human versus machine: establishing a human baseline for multimodal location estimation, in Proceedings of the 21st ACM International Conference on Multimedia, MM ’13, ACM, New York, pp. 867–876 (2013)Google Scholar
  9. 9.
    J. Choi, H. Lei, G. Friedland, The 2011 ICSI video location estimation system, in Larson et al. [32]Google Scholar
  10. 10.
    D.J. Crandall, L. Backstrom, D. Huttenlocher, J. Kleinberg, Mapping the world’s photos, in Proceedings of the 18th International Conference on World Wide Web, WWW ’09, ACM, 2009, pp. 761–770Google Scholar
  11. 11.
    J. Davies, J. Hare, S. Samangooei, J. Preston, N. Jain, D. Dupplaw, P. Lewis, Identifying the geographic location of an image with a multimodal probability density function, in Larson et al. [31]Google Scholar
  12. 12.
    D. Ferrès, H. Rodríguez, TALP at MediaEval 2010 Placing Task: geographical focus detection of Flickr textual annotations, in Larson et al. [33]Google Scholar
  13. 13.
    D. Ferres, H. Rodriguez, TALP at MediaEval 2011 Placing Task: georeferencing Flickr videos with geographical knowledge and information retrieval, in Larson et al. [32]Google Scholar
  14. 14.
    G. Friedland, J. Choi, Semantic computing and privacy: a case study using inferred geo-location. Int. J. Semant. Comput. 5(1), 79–93 (2011)CrossRefGoogle Scholar
  15. 15.
    G. Friedland, J. Choi, A. Janin, VIDEO2GPS: a demo of multimodal location estimation on Flickr videos, in Proceedings of the 19th ACM International Conference on Multimedia, MM ’11, ACM, New York, pp. 833–834 (2011)Google Scholar
  16. 16.
    A. Gallagher, D. Joshi, J. Yu, J. Luo, Geo-location inference from image content and user tags, in IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, 2009, CVPR Workshops 2009, June 2009, pp. 55–62Google Scholar
  17. 17.
    C. Hauff, A study on the accuracy of Flickr’s geotag data, in Proceedings of the 36th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’13, ACM, New York, pp. 1037–1040 (2013)Google Scholar
  18. 18.
    C. Hauff, G.-J. Houben, WISTUD at MediaEval 2011: placing task, in Larson et al. [32]Google Scholar
  19. 19.
    C. Hauff, G.-J. Houben, Geo-location estimation of Flickr images: social web based enrichment, in Proceedings of the 34th European Conference on Advances in Information Retrieval, ECIR’12. Springer, Berlin, pp. 85–96 (2012)Google Scholar
  20. 20.
    C. Hauff, G.-J. Houben, Placing images on the world map: a microblog-based enrichment approach, in Proceedings of the 35th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’12, ACM, New York, pp. 691–700 (2012)Google Scholar
  21. 21.
    C. Hauff, B. Thomee, M. Trevisiol, Working notes for the placing task at MediaEval 2013, in Larson et al. [31]Google Scholar
  22. 22.
    J. Hays, A.A. Efros, Im2gps: estimating geographic information from a single image, in CVPR. IEEE Computer Society (2008)Google Scholar
  23. 23.
    J.M. Perea-Ortega, M.Á. García-Cumbreras, L. Alfonso Ureña-López, M. García-Vega, SINAI at Placing Task of MediaEval 2010, in Larson et al. [33]Google Scholar
  24. 24.
    P. Kelm, S. Schmiedeke, T. Sikora, VIDEO2GPS: geotagging using collaborative systems, textual and visual features: MediaEval 2010 Placing Task, in Larson et al. [33]Google Scholar
  25. 25.
    P. Kelm, S. Schmiedeke, T. Sikora, A hierarchical, multi-modal approach for placing videos on the map using millions of Flickr photographs, in ACM Multimedia 2011 (Workshop on Social and Behavioral Networked Media Access—SBNMA), ACM, November 2011Google Scholar
  26. 26.
    P. Kelm, S. Schmiedeke, T. Sikora, Multi-modal, multi-resource methods for placing Flickr videos on the map, in Proceedings of the 1st ACM International Conference on Multimedia Retrieval, ICMR ’11, ACM, New York, pp. 52:1–52:8 (2011)Google Scholar
  27. 27.
    P. Kelm, S. Schmiedeke, T. Sikora, How spatial segmentation improves the multimodal geo-tagging, in Larson et al. [35]Google Scholar
  28. 28.
    G. Kordopatis-Zilos, S. Papadopoulos, E. Spyromitros-Xioufis, A.L. Symeonidis, Y. Kompatsiaris, CERTH at MediaEval Placing Task 2013, in Larson et al. [31]Google Scholar
  29. 29.
    F. Krippner, G. Meier, J. Hartmann, R. Knauf, Placing media items using the XTrieval framework, in Larson et al. [32]Google Scholar
  30. 30.
    O.V. Laere, S. Schockaert, V. Tanasescu, B. Dhoedt, C. Jones, Georeferencing Wikipedia documents using data from social media. ACM Trans. Inf. Syst. 32(3), (2014)Google Scholar
  31. 31.
    M. Larson, X. Anguera, T. Reuter, G.J.F. Jones, B. Ionescu, M. Schedl, T. Piatrik, C. Hauff, M. Soleymani (eds.), in Proceedings of the MediaEval 2013 Multimedia Benchmark Workshop, Barcelona, Spain, October 2013, CEUR-WS.org, online http://ceur-ws.org/Vol-1043 (2013)
  32. 32.
    M. Larson, A. Rae, C.-H. Demarty, C. Kofler, F. Metze, R. Troncy, V. Mezaris, G.J.F. Jones (eds.), in Working Notes Proceedings of the MediaEval 2011 Workshop, Pisa, Italy, September 2011, CEUR-WS.org, online http://ceur-ws.org/Vol-807 (2011)
  33. 33.
    M. Larson, M. Soleymani, P. Serdyukov, V. Murdock, G.J.F. Jones (eds.), in Working Notes Proceedings of the MediaEval 2010 Workshop, Pisa, Italy, October 2010, online http://multimediaeval.org/mediaeval2010/2010worknotes (2010)
  34. 34.
    M. Larson, M. Soleymani, P. Serdyukov, S. Rudinac, C. Wartena, V. Murdock, G. Friedland, R. Ordelman, G.J.F. Jones, Automatic tagging and geotagging in video collections and communities, in Proceedings of the 1st ACM International Conference on Multimedia Retrieval, ICMR ’11, ACM, New York, pp. 51:1–51:8 (2011)Google Scholar
  35. 35.
    M. Larson, S. Schmiedeke, P. Kelm, A. Rae, V. Mezaris, T. Piatrik, M. Soleymani, F. Metze, G.J.F. Jones (eds.), in Working Notes Proceedings of the MediaEval 2012 Workshop, Pisa, Italy, October 2012, CEUR-WS.org, online http://ceur-ws.org/Vol-927 (2012)
  36. 36.
    M. Larson, M. Soleymani, M. Eskevich, P. Serdyukov, R. Ordelman, G. Jones, The Community and the Crowd: Multimedia Benchmark Dataset Development. MultiMedia, IEEE. 19(3), 15–23 (2012)Google Scholar
  37. 37.
    H. Lei, J. Choi, G. Friedland, Multimodal city-verification on Flickr videos using acoustic and textual features, in 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), March 2012, pp. 2273–2276Google Scholar
  38. 38.
    L. Li, D. Pedronette, J. Almeida, O. Penatti, R. Calumby, R. Torres, A rank aggregation framework for video multimodal geocoding, pp. 1–37 (2013)Google Scholar
  39. 39.
    L.T. Li, J. Almeida, R.D.S. Torres, RECOD working notes for placing task MediaEval 2011, in Larson et al. [32]Google Scholar
  40. 40.
    L.T. Li, J. Almeida, D.C.G Pedronette, O. Penatti, R.D.S. Torres, A multimodal approach for video geocoding, in Larson et al. [35]Google Scholar
  41. 41.
    L.T. Li, J. Almeida, O. Penatti, R. Calumby, D.C.G. Pedronette, M.A. Gonçalves, R.D.S. Torres, Multimodal image geocoding: the 2013 RECOD’s approach, in Larson et al. [31]Google Scholar
  42. 42.
    X. Li, C. Hauff, M.A. Larson, A. Hanjalic, Preliminary exploration of the use of geographical information for content-based geo-tagging of social video, in Larson et al. [35]Google Scholar
  43. 43.
    X. Li, M. Riegler, M. Larson, A. Hanjalic, Exploration of feature combination in geo-visual ranking for visual content-based location prediction, in Larson et al. [31]Google Scholar
  44. 44.
    N. O’Hare, V. Murdock, Modeling locations with social media. Inf. Retr. 16(1), 30–62 (2013)CrossRefGoogle Scholar
  45. 45.
    J. Oomen, P. Over, W. Kraaij, A. Smeaton, Symbiosis between the TrecVid benchmark and video libraries at the Netherlands Institute for Sound and Vision. Int. J. Digit. Libr. 13(2), 91–104 (2013)Google Scholar
  46. 46.
    O.A.B. Penatti, L.T. Li, J. Almeida, R.D.S. Torres, A visual approach for video geocoding using bag-of-scenes, in Proceedings of the 2Nd ACM International Conference on Multimedia Retrieval, ICMR ’12, ACM, New York, pp. 53:1–53:8 (2012)Google Scholar
  47. 47.
    A. Popescu, CEA List’s participation at MediaEval 2013 Placing Task, in Larson et al. [31]Google Scholar
  48. 48.
    A. Popescu, N. Ballas, CEA List’s participation at MediaEval 2012 Placing Task, in Larson et al. [35]Google Scholar
  49. 49.
    A. Rae, P. Kelm, Working notes for the Placing Task at MediaEval 2012, in Larson et al. [35]Google Scholar
  50. 50.
    A. Rae, V. Murdock, P. Serdyukov, P. Kelm, Working notes for the Placing Task at MediaEval 2011, in Larson et al. [32]Google Scholar
  51. 51.
    S. Schmiedeke, C. Kofler, I. Ferrané, Overview of the MediaEval 2012 Tagging Task, Working Notes Proceedings of the MediaEval 2012 Workshop, Santa Croce in Fossabanda, Pisa, Italy, October 4–5, CEUR-WS.org, ISSN 1613–0073 (2012)Google Scholar
  52. 52.
    P. Serdyukov, V. Murdock, R. van Zwol, Placing Flickr photos on a map, in Proceedings of the 32Nd International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’09, ACM, New York, pp. 484–491 (2009)Google Scholar
  53. 53.
    D.A. Shamma, One hundred million creative commons Flickr images for research. http://yahoolabs.tumblr.com/post/89783581601/one-hundred-million-creative-commons-flickr-images-for, month = June, note = Accessed: 30 June 2014 (2014)
  54. 54.
    A.F. Smeaton, P. Over, W. Kraaij, Evaluation campaigns and TrecVid, in Proceedings of the 8th ACM International Workshop on Multimedia Information Retrieval, MIR ’06, ACM, New York, pp. 321–330 (2006)Google Scholar
  55. 55.
    S. Subramanian, V. Vidyasagaran, K. Chandramouli, VIT@MediaEval 2013 Placing Task: location specific tag weighting for language model based placing of images, in Larson et al. [31]Google Scholar
  56. 56.
    M. Trevisiol, J. Delhumeau, H. Jégou, G. Gravier, How INRIA/IRISA identifies geographic location of a video, in Larson et al. [35]Google Scholar
  57. 57.
    M. Trevisiol, H. Jégou, J. Delhumeau, G. Gravier, Retrieving geo-location of videos with a divide & conquer hierarchical multimodal approach, in Proceedings of the 3rd ACM Conference on International Conference on Multimedia Retrieval, ICMR ’13, ACM, New York, pp. 1–8 (2013)Google Scholar
  58. 58.
    O. Van Laere, S. Schockaert, B. Dhoedt, Ghent University at the 2010 Placing Task, in Larson et al. [33]Google Scholar
  59. 59.
    O. Van Laere, S. Schockaert, B. Dhoedt, Finding locations of Flickr resources using language models and similarity search, in Proceedings of the 1st ACM International Conference on Multimedia Retrieval, ICMR ’11, ACM, New York, pp. 48:1–48:8 (2011)Google Scholar
  60. 60.
    O. Van Laere, S. Schockaert, B. Dhoedt, Ghent University at the 2011 Placing Task, in Larson et al. [32]Google Scholar
  61. 61.
    O. Van Laere, S. Schockaert, B. Dhoedt, Georeferencing Flickr photos using language models at different levels of granularity: an evidence based approach. J. Web Semant. 16, 17–31 (2012)Google Scholar
  62. 62.
    O. Van Laere, S. Schockaert, B. Dhoedt, Georeferencing Flickr resources based on textual meta-data. Inf. Sci. 238, 52–74 (2013)Google Scholar
  63. 63.
    O. Van Laere, S. Schockaert, J. Quinn, F. Langbein, B. Dhoedt, Ghent and CARDIFF University at the 2012 Placing Task, in Larson et al. [35]Google Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  • Martha Larson
    • 1
  • Pascal Kelm
    • 2
  • Adam Rae
    • 3
  • Claudia Hauff
    • 1
  • Bart Thomee
    • 4
  • Michele Trevisiol
    • 5
  • Jaeyoung Choi
    • 7
  • Olivier Van Laere
    • 6
  • Steven Schockaert
    • 8
  • Gareth J.F. Jones
    • 9
  • Pavel Serdyukov
    • 10
  • Vanessa Murdock
    • 11
  • Gerald Friedland
    • 7
  1. 1.Delft University of TechnologyDelftThe Netherlands
  2. 2.Technische UniversitätBerlinGermany
  3. 3.Future Cities CatapultLondonUK
  4. 4.Yahoo LabsSan FranciscoUSA
  5. 5.Pompeu Fabra UniversityBarcelonaSpain
  6. 6.Yahoo LabsBarcelonaSpain
  7. 7.ICSIBerkeleyUSA
  8. 8.Cardiff UniversityCardiffUK
  9. 9.Dublin City UniversityDublinIreland
  10. 10.YandexMoscowRussia
  11. 11.MicrosoftBellevueUSA

Personalised recommendations