Skip to main content

Exploiting Wikipedia for Entity Name Disambiguation in Tweets

  • Conference paper
Natural Language Processing and Information Systems (NLDB 2014)

Abstract

Social media repositories serve as a significant source of evidence when extracting information related to the reputation of a particular entity (e.g., a particular politician, singer or company). Reputation management experts are in need of automated methods for mining the social media repositories (in particular Twitter) to monitor the reputation of a particular entity. A quite significant research challenge related to the above issue is to disambiguate tweets with respect to entity names. To address this issue in this paper we use “context phrases” in a tweet and Wikipedia disambiguated articles for a particular entity in a random forest classifier. Furthermore, we also utilize the concept of “relatedness” between tweet and entity using the Wikipedia category-article structure that captures the amount of discussion present inside a tweet related to an entity. The experimental evaluations show a significant improvement over the baseline and comparable performance with other systems representing strong performance given that we restrict ourselves to features extracted from Wikipedia.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Amigó, E., Carrillo de Albornoz, J., Chugur, I., Corujo, A., Gonzalo, J., Martín, T., Meij, E., de Rijke, M., Spina, D.: Overview of replab 2013: Evaluating online reputation monitoring systems. In: Forner, P., Müller, H., Paredes, R., Rosso, P., Stein, B. (eds.) CLEF 2013. LNCS, vol. 8138, pp. 333–352. Springer, Heidelberg (2013)

    Chapter  Google Scholar 

  2. Amigó, E., Gonzalo, J., Verdejo, F.: A General Evaluation Measure for Document Organization Tasks. In: Proceedings SIGIR (July 2013)

    Google Scholar 

  3. Bizer, C., Lehmann, J., Kobilarov, G., Auer, S., Becker, C., Cyganiak, R., Hellmann, S.: Dbpedia - a crystallization point for the web of data. Web Semant 7(3), 154–165 (2009)

    Article  Google Scholar 

  4. Bunescu, R.C., Pasca, M.: Using encyclopedic knowledge for named entity disambiguation. In: EACL, vol. 6, pp. 9–16 (2006)

    Google Scholar 

  5. Dellarocas, C., Awad, N.F., Zhang, X.M.: Exploring the value of online reviews to organizations: Implications for revenue forecasting and planning. In: Management Science, pp. 1407–1424 (2003)

    Google Scholar 

  6. Ferragina, P., Scaiella, U.: Tagme: on-the-fly annotation of short text fragments (by wikipedia entities). In: CIKM 2010, pp. 1625–1628. ACM, New York (2010)

    Google Scholar 

  7. Han, X., Zhao, J.: Named entity disambiguation by leveraging wikipedia semantic knowledge. In: CIKM 2009, pp. 215–224. ACM, New York (2009)

    Google Scholar 

  8. Meij, E., Weerkamp, W., de Rijke, M.: Adding semantics to microblog posts. In: WSDM 2012, pp. 563–572. ACM, New York (2012)

    Google Scholar 

  9. Milne, D., Witten, I.H.: Learning to link with wikipedia. In: CIKM 2008, pp. 509–518. ACM (2008)

    Google Scholar 

  10. Peetz, M.-H., Spina, D., Gonzalo, J., de Rijke, M.: Towards an active learning system for company name disambiguation in microblog streams. In: CLEF (Online Working Notes/Labs/Workshop) (2013)

    Google Scholar 

  11. Qureshi, M.A., Younus, A., Abril, D., O’Riordan, C., Pasi, G.: Cirg irdisco at replab2013 filtering task: Use of wikipedia’s graph structure for entity name disambiguation in tweets. In: CLEF (Online Working Notes/Labs/Workshop) (2013)

    Google Scholar 

  12. Zesch, T., Gurevych, I.: Analysis of the Wikipedia Category Graph for NLP Applications. In: Proceedings of the TextGraphs-2 Workshop, NAACL-HLT (2007)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Qureshi, M.A., O’Riordan, C., Pasi, G. (2014). Exploiting Wikipedia for Entity Name Disambiguation in Tweets. In: Métais, E., Roche, M., Teisseire, M. (eds) Natural Language Processing and Information Systems. NLDB 2014. Lecture Notes in Computer Science, vol 8455. Springer, Cham. https://doi.org/10.1007/978-3-319-07983-7_25

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-07983-7_25

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-07982-0

  • Online ISBN: 978-3-319-07983-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics