Skip to main content

Exploiting Locality of Wikipedia Links in Entity Ranking

  • Conference paper
Advances in Information Retrieval (ECIR 2008)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 4956))

Included in the following conference series:

Abstract

Information retrieval from web and XML document collections is ever more focused on returning entities instead of web pages or XML elements. There are many research fields involving named entities; one such field is known as entity ranking, where one goal is to rank entities in response to a query supported with a short list of entity examples. In this paper, we describe our approach to ranking entities from the Wikipedia XML document collection. Our approach utilises the known categories and the link structure of Wikipedia, and more importantly, exploits link co-occurrences to improve the effectiveness of entity ranking. Using the broad context of a full Wikipedia page as a baseline, we evaluate two different algorithms for identifying narrow contexts around the entity examples: one that uses predefined types of elements such as paragraphs, lists and tables; and another that dynamically identifies the contexts by utilising the underlying XML document structure. Our experiments demonstrate that the locality of Wikipedia links can be exploited to significantly improve the effectiveness of entity ranking.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. AwangIskandar, D., Pehcevski, J., Thom, J.A., Tahaghoghi, S.M.M.: Social media retrieval using image features and structured text. In: Fuhr, N., Lalmas, M., Trotman, A. (eds.) INEX 2006. LNCS, vol. 4518, pp. 358–372. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  2. Bast, H., Chitea, A., Suchanek, F., Weber, I.: ESTER: efficient search on text, entities, and relations. In: Proceedings of the 30th ACM International Conference on Research and Development in Information Retrieval, Amsterdam, The Netherlands, pp. 671–678 (2007)

    Google Scholar 

  3. Brin, S., Page, L.: The anatomy of a large-scale hypertextual Web search engine. In: Proceedings of the 7th International Conference on World Wide Web, Brisbane, Australia, pp. 107–117 (1998)

    Google Scholar 

  4. Cai, D., He, X., Wen, J.-R., Ma, W.-Y.: Block-level link analysis. In: Proceedings of the 27th ACM International Conference on Research and Development in Information Retrieval, Sheffield, UK, pp. 440–447 (2004)

    Google Scholar 

  5. Callan, J., Mitamura, T.: Knowledge-based extraction of named entities. In: Proceedings of the 11th ACM Conference on Information and Knowledge Management, McLean, Virginia, pp. 532–537 (2002)

    Google Scholar 

  6. Cucerzan, S.: Large-scale named entity disambiguation based on Wikipedia data. In: Proceedings of the 2007 Joint Conference on EMNLP and CoNLL, Prague, The Czech Republic, pp. 708–716 (2007)

    Google Scholar 

  7. de Vries, A.P., Thom, J.A., Vercoustre, A.-M., Craswell, N., Lalmas, M.: INEX 2007 Entity ranking track guidelines. In: INEX 2006, pp. 481–486 (2007)

    Google Scholar 

  8. Denoyer, L., Gallinari, P.: The Wikipedia XML corpus. SIGIR Forum 40(1), 64–69 (2006)

    Article  Google Scholar 

  9. Kazama, J., Torisawa, K.: Exploiting Wikipedia as external knowledge for named entity recognition. In: Proceedings of the 2007 Joint Conference on EMNLP and CoNLL, Prague, The Czech Republic, pp. 698–707 (2007)

    Google Scholar 

  10. Kleinberg, J.M.: Authoritative sources in hyperlinked environment. Journal of the ACM 46(5), 604–632 (1999)

    Article  MATH  MathSciNet  Google Scholar 

  11. Middleton, C.,Baeza-Yates, R.: A comparison of open source search engines. Technical report, Universitat Pompeu Fabra, Barcelona, Spain (2007), http://wrg.upf.edu/WRG/dctos/Middleton-Baeza.pdf

  12. Nie, L., Davison, B.D., Qi, X.: Topical link analysis for web search. In: Proceedings of the 29th ACM International Conference on Research and Development in Information Retrieval, Seattle, Washington, pp. 91–98 (2006)

    Google Scholar 

  13. Pehcevski, J., Thom, J.A., Vercoustre, A.-M.: Hybrid XML retrieval: Combining information retrieval and a native XML database. Information Retrieval 8(4), 571–600 (2005)

    Article  Google Scholar 

  14. Soboroff, I., de Vries, A.P., Craswell, N.: Overview of the TREC 2006 Enterprise track. In: Proceedings of the Fifteenth Text REtrieval Conference (TREC 2006), pp. 32–51 (2006)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Craig Macdonald Iadh Ounis Vassilis Plachouras Ian Ruthven Ryen W. White

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Pehcevski, J., Vercoustre, AM., Thom, J.A. (2008). Exploiting Locality of Wikipedia Links in Entity Ranking. In: Macdonald, C., Ounis, I., Plachouras, V., Ruthven, I., White, R.W. (eds) Advances in Information Retrieval. ECIR 2008. Lecture Notes in Computer Science, vol 4956. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-78646-7_25

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-78646-7_25

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-78645-0

  • Online ISBN: 978-3-540-78646-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics