Context Based Wikipedia Linking
Automatically linking Wikipedia pages can be done either content based by exploiting word similarities or structure based by exploiting characteristics of the link graph. Our approach focuses on a content based strategy by detecting Wikipedia titles as link candidates and selecting the most relevant ones as links. The relevance calculation is based on the context, i.e. the surrounding text of a link candidate. Our goal was to evaluate the influence of the link-context on selecting relevant links and determining a links best-entry-point. Results show, that a whole Wikipedia page provides the best context for resolving link and that straight forward inverse document frequency based scoring of anchor texts achieves around 4% less Mean Average Precision on the provided data set.
KeywordsINEX Link-the-Wiki Context Exploitation
Unable to display preview. Download preview PDF.
- 1.Baldridge, T.M.J., Bierner, G.: Opennlp: The maximum entropy framework (2001), http://maxent.sourceforge.net/about.html (last visited June 2008)
- 2.Choi, F.Y.Y.: Advances in domain independent linear text segmentation. In: Proceedings of the first conference on North American chapter of the Association for Computational Linguistics, pp. 26–33. Morgan Kaufmann Publishers Inc., San Francisco (2000)Google Scholar
- 4.Geva, S.: Gpx: Ad-hoc queries and automated link discovery in the wikipedia, pp. 404–416 (2008)Google Scholar
- 5.Hatcher, E., Gospodnetic, O.: Lucene in Action (In Action series). Manning Publications (December 2004)Google Scholar
- 7.Itakura, K.Y., Clarke, C.L.: University of waterloo at inex2007: Adhoc and link-the-wiki tracks, pp. 417–425 (2008)Google Scholar