Skip to main content

Leveraging Semantic Annotations to Link Wikipedia and News Archives

  • Conference paper
Book cover Advances in Information Retrieval (ECIR 2016)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 9626))

Included in the following conference series:

Abstract

The incomprehensible amount of information available online has made it difficult to retrospect on past events. We propose a novel linking problem to connect excerpts from Wikipedia summarizing events to online news articles elaborating on them. To address this linking problem, we cast it into an information retrieval task by treating a given excerpt as a user query with the goal to retrieve a ranked list of relevant news articles. We find that Wikipedia excerpts often come with additional semantics, in their textual descriptions, representing the time, geolocations, and named entities involved in the event. Our retrieval model leverages text and semantic annotations as different dimensions of an event by estimating independent query models to rank documents. In our experiments on two datasets, we compare methods that consider different combinations of dimensions and find that the approach that leverages all dimensions suits our problem best.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    http://en.wikipedia.org/wiki/Portal:Currentevents.

  2. 2.

    http://resources.mpi-inf.mpg.de/d5/linkingWiki2News/.

  3. 3.

    http://corpus.nytimes.com.

  4. 4.

    http://www.lemurproject.org/clueweb12.php/.

  5. 5.

    http://www.crowdflower.com/.

  6. 6.

    http://lemurproject.org/clueweb12/FACC1/.

  7. 7.

    https://github.com/geoparser/geolocator.

  8. 8.

    http://www.geonames.org/.

  9. 9.

    http://nlp.stanford.edu/software/corenlp.shtml.

References

  1. Arapakis, I., et al.: Automatically embedding newsworthy links to articles: From implementation to evaluation. JASIST 65(1), 129–145 (2014)

    Google Scholar 

  2. Bai, J., et al.: Using query contexts in information retrieval. In: SIGIR.(2007)

    Google Scholar 

  3. Balog, K., et al.: Overview of the TREC 2010 entity track. In: DTIC.(2010)

    Google Scholar 

  4. Bellot, P., et al.: Report on INEX 2013. ACM SIGIR Forum 47(2), 21–32 (2013)

    Article  Google Scholar 

  5. Berberich, Klaus, Bedathur, Srikanta, Alonso, Omar, Weikum, Gerhard: A Language Modeling Approach for Temporal Information Needs. In: Gurrin, Cathal, He, Yulan, Kazai, Gabriella, Kruschwitz, Udo, Little, Suzanne, Roelleke, Thomas, Rüger, Stefan, van Rijsbergen, Keith (eds.) ECIR 2010. LNCS, vol. 5993, pp. 13–25. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  6. Bron, Marc, Huurnink, Bouke, de Rijke, Maarten: Linking Archives Using Document Enrichment and Term Selection. In: Gradmann, Stefan, Borri, Francesca, Meghini, Carlo, Schuldt, Heiko (eds.) TPDL 2011. LNCS, vol. 6966, pp. 360–371. Springer, Heidelberg (2011)

    Chapter  Google Scholar 

  7. Cozza, Vittoria, Messina, Antonio, Montesi, Danilo, Arietta, Luca, Magnani, Matteo: Spatio-Temporal Keyword Queries in Social Networks. In: Catania, Barbara, Guerrini, Giovanna, Pokorný, Jaroslav (eds.) ADBIS 2013. LNCS, vol. 8133, pp. 70–83. Springer, Heidelberg (2013)

    Chapter  Google Scholar 

  8. Croft, B., et al.: Search Engines: Information Retrieval in Practice. Addison-Wesley, Reading.(2010)

    Google Scholar 

  9. Dalton, J., et al.: Entity query feature expansion using knowledge base links. In: SIGIR.(2014)

    Google Scholar 

  10. Demartini, Gianluca, Iofciu, Tereza, de Vries, Arjen P.: Overview of the INEX 2009 Entity Ranking Track. In: Geva, Shlomo, Kamps, Jaap, Trotman, Andrew (eds.) INEX 2009. LNCS, vol. 6203, pp. 254–264. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  11. Efron, M., et al.: Temporal feedback for tweet search with non-parametric density estimation. In: SIGIR.(2014)

    Google Scholar 

  12. Gey, F., et al.: NTCIR-GeoTime overview: Evaluating geographic and temporal search. In: NTCIR.(2010)

    Google Scholar 

  13. Hariharan, R., et al.: Processing spatial-keyword (SK) queries in geographic information retrieval (GIR) systems. In: SSDBM.(2007)

    Google Scholar 

  14. Henzinger, M.R., et al.: Query-free news search. World Wide Web 8, 101–126 (2005)

    Article  Google Scholar 

  15. Hoffart, J., et al.: YAGO2: A spatially and temporally enhanced knowledge base from Wikipedia. In: IJCAI.(2013)

    Google Scholar 

  16. Hoffart, J., et al.: Robust Disambiguation of Named Entities in Text. In: EMNLP.(2011)

    Google Scholar 

  17. Kulkarni, A., et al.: Understanding temporal query dynamics. In: WSDM.(2011)

    Google Scholar 

  18. Mandl, Thomas, Gey, Fredric C., Di Nunzio, Giorgio Maria, Ferro, Nicola, Larson, Ray R., Sanderson, Mark, Santos, Diana, Womser-Hacker, Christa, Xie, Xing: GeoCLEF 2007: The CLEF 2007 Cross-Language Geographic Information Retrieval Track Overview. In: Peters, Carol, Jijkoun, Valentin, Mandl, Thomas, Müller, Henning, Oard, Douglas W., Peñas, Anselmo, Petras, Vivien, Santos, Diana (eds.) CLEF 2007. LNCS, vol. 5152, pp. 745–772. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

  19. Mishra, A., et al.: Linking wikipedia events to past news. In: TAIA.(2014)

    Google Scholar 

  20. Peetz, M., et al.: Using temporal bursts for query modeling. Inf. retrieval 17(1), 74–108 (2014)

    Article  Google Scholar 

  21. Peetz, Maria-Hendrike, de Rijke, Maarten: Cognitive Temporal Document Priors. In: Serdyukov, Pavel, Braslavski, Pavel, Kuznetsov, Sergei O., Kamps, Jaap, Rüger, Stefan, Agichtein, Eugene, Segalovich, Ilya, Yilmaz, Emine (eds.) ECIR 2013. LNCS, vol. 7814, pp. 318–330. Springer, Heidelberg (2013)

    Chapter  Google Scholar 

  22. Perea-Ortega, José M., Ureña-López, LAlfonso: Geographic Expansion of Queries to Improve the Geographic Information Retrieval Task. In: Bouma, Gosse, Ittoo, Ashwin, Métais, Elisabeth, Wortmann, Hans (eds.) NLDB 2012. LNCS, vol. 7337, pp. 94–103. Springer, Heidelberg (2012)

    Chapter  Google Scholar 

  23. Ricardo, C., et al.: Survey of temporal information retrieval and related applications. ACM Comput. Surv. (CSUR) 47(2), 1–41 (2014)

    Google Scholar 

  24. Shen, X., et al.: Context-sensitive information retrieval using implicit feedback. In: SIGIR.(2005)

    Google Scholar 

  25. Tan, B., et al.: Mining long-term search history to improve search accuracy. In: KDD.(2006)

    Google Scholar 

  26. Tsagkias, M., et al.: Linking online news and social media. In: WSDM.(2011)

    Google Scholar 

  27. Zhai, C., et al.: Model-based feedback in the language modeling approach to information retrieval. In: CIKM.(2001)

    Google Scholar 

  28. Zhai, C., et al.: Two-stage language models for information retrieval. In: SIGIR.(2002)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Arunav Mishra .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

Mishra, A., Berberich, K. (2016). Leveraging Semantic Annotations to Link Wikipedia and News Archives. In: Ferro, N., et al. Advances in Information Retrieval. ECIR 2016. Lecture Notes in Computer Science(), vol 9626. Springer, Cham. https://doi.org/10.1007/978-3-319-30671-1_3

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-30671-1_3

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-30670-4

  • Online ISBN: 978-3-319-30671-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics