An Exploration of Learning to Link with Wikipedia: Features, Methods and Training Collection
We describe our participation in the Link-the-Wiki track at INEX 2009. We apply machine learning methods to the anchor-to-best-entry-point task and explore the impact of the following aspects of our approaches: features, learning methods as well as the collection used for training the models. We find that a learning to rank-based approach and a binary classification approach do not differ a lot. The new Wikipedia collection which is of larger size and which has more links than the collection previously used, provides better training material for learning our models. In addition, a heuristic run which combines the two intuitively most useful features outperforms machine learning based runs, which suggests that a further analysis and selection of features is necessary.
Unable to display preview. Download preview PDF.
- 1.Denoyer, L., Gallinari, P.: The Wikipedia XML Corpus. SIGIR Forum (2006)Google Scholar
- 2.He, J., de Rijke, M.: A ranking approach to target detection for automatic link generation. In: SIGIR ’10: Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval. ACM, New York (2010)Google Scholar
- 3.Herbrich, R., Graepel, T., Obermayer, K.: Large margin rank boundaries for ordinal regression. MIT Press, Cambridge (2000)Google Scholar
- 6.Schenkel, R., Suchanek, F., Kasneci, G.: YAWN: A semantically annotated Wikipedia XML corpus. In: BTW 2007 (2007)Google Scholar