Exploiting Wikipedia for Entity Name Disambiguation in Tweets

  • Muhammad Atif Qureshi
  • Colm O’Riordan
  • Gabriella Pasi
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8455)


Social media repositories serve as a significant source of evidence when extracting information related to the reputation of a particular entity (e.g., a particular politician, singer or company). Reputation management experts are in need of automated methods for mining the social media repositories (in particular Twitter) to monitor the reputation of a particular entity. A quite significant research challenge related to the above issue is to disambiguate tweets with respect to entity names. To address this issue in this paper we use “context phrases” in a tweet and Wikipedia disambiguated articles for a particular entity in a random forest classifier. Furthermore, we also utilize the concept of “relatedness” between tweet and entity using the Wikipedia category-article structure that captures the amount of discussion present inside a tweet related to an entity. The experimental evaluations show a significant improvement over the baseline and comparable performance with other systems representing strong performance given that we restrict ourselves to features extracted from Wikipedia.


Relatedness Score Anchor Text Phrase Extraction Music Domain Duplicate Article 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Amigó, E., Carrillo de Albornoz, J., Chugur, I., Corujo, A., Gonzalo, J., Martín, T., Meij, E., de Rijke, M., Spina, D.: Overview of replab 2013: Evaluating online reputation monitoring systems. In: Forner, P., Müller, H., Paredes, R., Rosso, P., Stein, B. (eds.) CLEF 2013. LNCS, vol. 8138, pp. 333–352. Springer, Heidelberg (2013)CrossRefGoogle Scholar
  2. 2.
    Amigó, E., Gonzalo, J., Verdejo, F.: A General Evaluation Measure for Document Organization Tasks. In: Proceedings SIGIR (July 2013)Google Scholar
  3. 3.
    Bizer, C., Lehmann, J., Kobilarov, G., Auer, S., Becker, C., Cyganiak, R., Hellmann, S.: Dbpedia - a crystallization point for the web of data. Web Semant 7(3), 154–165 (2009)CrossRefGoogle Scholar
  4. 4.
    Bunescu, R.C., Pasca, M.: Using encyclopedic knowledge for named entity disambiguation. In: EACL, vol. 6, pp. 9–16 (2006)Google Scholar
  5. 5.
    Dellarocas, C., Awad, N.F., Zhang, X.M.: Exploring the value of online reviews to organizations: Implications for revenue forecasting and planning. In: Management Science, pp. 1407–1424 (2003)Google Scholar
  6. 6.
    Ferragina, P., Scaiella, U.: Tagme: on-the-fly annotation of short text fragments (by wikipedia entities). In: CIKM 2010, pp. 1625–1628. ACM, New York (2010)Google Scholar
  7. 7.
    Han, X., Zhao, J.: Named entity disambiguation by leveraging wikipedia semantic knowledge. In: CIKM 2009, pp. 215–224. ACM, New York (2009)Google Scholar
  8. 8.
    Meij, E., Weerkamp, W., de Rijke, M.: Adding semantics to microblog posts. In: WSDM 2012, pp. 563–572. ACM, New York (2012)Google Scholar
  9. 9.
    Milne, D., Witten, I.H.: Learning to link with wikipedia. In: CIKM 2008, pp. 509–518. ACM (2008)Google Scholar
  10. 10.
    Peetz, M.-H., Spina, D., Gonzalo, J., de Rijke, M.: Towards an active learning system for company name disambiguation in microblog streams. In: CLEF (Online Working Notes/Labs/Workshop) (2013)Google Scholar
  11. 11.
    Qureshi, M.A., Younus, A., Abril, D., O’Riordan, C., Pasi, G.: Cirg irdisco at replab2013 filtering task: Use of wikipedia’s graph structure for entity name disambiguation in tweets. In: CLEF (Online Working Notes/Labs/Workshop) (2013)Google Scholar
  12. 12.
    Zesch, T., Gurevych, I.: Analysis of the Wikipedia Category Graph for NLP Applications. In: Proceedings of the TextGraphs-2 Workshop, NAACL-HLT (2007)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  • Muhammad Atif Qureshi
    • 1
    • 2
  • Colm O’Riordan
    • 1
  • Gabriella Pasi
    • 2
  1. 1.Computational Intelligence Research Group, Information TechnologyNational University of IrelandGalwayIreland
  2. 2.Information Retrieval Lab, Informatics, Systems and CommunicationUniversity of Milan BicoccaMilanItaly

Personalised recommendations