Leveraging Semantic Annotations to Link Wikipedia and News Archives

Part of the Lecture Notes in Computer Science book series (LNCS, volume 9626)


The incomprehensible amount of information available online has made it difficult to retrospect on past events. We propose a novel linking problem to connect excerpts from Wikipedia summarizing events to online news articles elaborating on them. To address this linking problem, we cast it into an information retrieval task by treating a given excerpt as a user query with the goal to retrieve a ranked list of relevant news articles. We find that Wikipedia excerpts often come with additional semantics, in their textual descriptions, representing the time, geolocations, and named entities involved in the event. Our retrieval model leverages text and semantic annotations as different dimensions of an event by estimating independent query models to rank documents. In our experiments on two datasets, we compare methods that consider different combinations of dimensions and find that the approach that leverages all dimensions suits our problem best.


  1. 1.
    Arapakis, I., et al.: Automatically embedding newsworthy links to articles: From implementation to evaluation. JASIST 65(1), 129–145 (2014)Google Scholar
  2. 2.
    Bai, J., et al.: Using query contexts in information retrieval. In: SIGIR.(2007)Google Scholar
  3. 3.
    Balog, K., et al.: Overview of the TREC 2010 entity track. In: DTIC.(2010)Google Scholar
  4. 4.
    Bellot, P., et al.: Report on INEX 2013. ACM SIGIR Forum 47(2), 21–32 (2013)CrossRefGoogle Scholar
  5. 5.
    Berberich, Klaus, Bedathur, Srikanta, Alonso, Omar, Weikum, Gerhard: A Language Modeling Approach for Temporal Information Needs. In: Gurrin, Cathal, He, Yulan, Kazai, Gabriella, Kruschwitz, Udo, Little, Suzanne, Roelleke, Thomas, Rüger, Stefan, van Rijsbergen, Keith (eds.) ECIR 2010. LNCS, vol. 5993, pp. 13–25. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  6. 6.
    Bron, Marc, Huurnink, Bouke, de Rijke, Maarten: Linking Archives Using Document Enrichment and Term Selection. In: Gradmann, Stefan, Borri, Francesca, Meghini, Carlo, Schuldt, Heiko (eds.) TPDL 2011. LNCS, vol. 6966, pp. 360–371. Springer, Heidelberg (2011)CrossRefGoogle Scholar
  7. 7.
    Cozza, Vittoria, Messina, Antonio, Montesi, Danilo, Arietta, Luca, Magnani, Matteo: Spatio-Temporal Keyword Queries in Social Networks. In: Catania, Barbara, Guerrini, Giovanna, Pokorný, Jaroslav (eds.) ADBIS 2013. LNCS, vol. 8133, pp. 70–83. Springer, Heidelberg (2013)CrossRefGoogle Scholar
  8. 8.
    Croft, B., et al.: Search Engines: Information Retrieval in Practice. Addison-Wesley, Reading.(2010)Google Scholar
  9. 9.
    Dalton, J., et al.: Entity query feature expansion using knowledge base links. In: SIGIR.(2014)Google Scholar
  10. 10.
    Demartini, Gianluca, Iofciu, Tereza, de Vries, Arjen P.: Overview of the INEX 2009 Entity Ranking Track. In: Geva, Shlomo, Kamps, Jaap, Trotman, Andrew (eds.) INEX 2009. LNCS, vol. 6203, pp. 254–264. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  11. 11.
    Efron, M., et al.: Temporal feedback for tweet search with non-parametric density estimation. In: SIGIR.(2014)Google Scholar
  12. 12.
    Gey, F., et al.: NTCIR-GeoTime overview: Evaluating geographic and temporal search. In: NTCIR.(2010)Google Scholar
  13. 13.
    Hariharan, R., et al.: Processing spatial-keyword (SK) queries in geographic information retrieval (GIR) systems. In: SSDBM.(2007)Google Scholar
  14. 14.
    Henzinger, M.R., et al.: Query-free news search. World Wide Web 8, 101–126 (2005)CrossRefGoogle Scholar
  15. 15.
    Hoffart, J., et al.: YAGO2: A spatially and temporally enhanced knowledge base from Wikipedia. In: IJCAI.(2013)Google Scholar
  16. 16.
    Hoffart, J., et al.: Robust Disambiguation of Named Entities in Text. In: EMNLP.(2011)Google Scholar
  17. 17.
    Kulkarni, A., et al.: Understanding temporal query dynamics. In: WSDM.(2011)Google Scholar
  18. 18.
    Mandl, Thomas, Gey, Fredric C., Di Nunzio, Giorgio Maria, Ferro, Nicola, Larson, Ray R., Sanderson, Mark, Santos, Diana, Womser-Hacker, Christa, Xie, Xing: GeoCLEF 2007: The CLEF 2007 Cross-Language Geographic Information Retrieval Track Overview. In: Peters, Carol, Jijkoun, Valentin, Mandl, Thomas, Müller, Henning, Oard, Douglas W., Peñas, Anselmo, Petras, Vivien, Santos, Diana (eds.) CLEF 2007. LNCS, vol. 5152, pp. 745–772. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  19. 19.
    Mishra, A., et al.: Linking wikipedia events to past news. In: TAIA.(2014)Google Scholar
  20. 20.
    Peetz, M., et al.: Using temporal bursts for query modeling. Inf. retrieval 17(1), 74–108 (2014)CrossRefGoogle Scholar
  21. 21.
    Peetz, Maria-Hendrike, de Rijke, Maarten: Cognitive Temporal Document Priors. In: Serdyukov, Pavel, Braslavski, Pavel, Kuznetsov, Sergei O., Kamps, Jaap, Rüger, Stefan, Agichtein, Eugene, Segalovich, Ilya, Yilmaz, Emine (eds.) ECIR 2013. LNCS, vol. 7814, pp. 318–330. Springer, Heidelberg (2013)CrossRefGoogle Scholar
  22. 22.
    Perea-Ortega, José M., Ureña-López, LAlfonso: Geographic Expansion of Queries to Improve the Geographic Information Retrieval Task. In: Bouma, Gosse, Ittoo, Ashwin, Métais, Elisabeth, Wortmann, Hans (eds.) NLDB 2012. LNCS, vol. 7337, pp. 94–103. Springer, Heidelberg (2012)CrossRefGoogle Scholar
  23. 23.
    Ricardo, C., et al.: Survey of temporal information retrieval and related applications. ACM Comput. Surv. (CSUR) 47(2), 1–41 (2014)Google Scholar
  24. 24.
    Shen, X., et al.: Context-sensitive information retrieval using implicit feedback. In: SIGIR.(2005)Google Scholar
  25. 25.
    Tan, B., et al.: Mining long-term search history to improve search accuracy. In: KDD.(2006)Google Scholar
  26. 26.
    Tsagkias, M., et al.: Linking online news and social media. In: WSDM.(2011)Google Scholar
  27. 27.
    Zhai, C., et al.: Model-based feedback in the language modeling approach to information retrieval. In: CIKM.(2001)Google Scholar
  28. 28.
    Zhai, C., et al.: Two-stage language models for information retrieval. In: SIGIR.(2002)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2016

Authors and Affiliations

  1. 1.Max Planck Institute for InformaticsSaarbrückenGermany

Personalised recommendations