Company Name Disambiguation in Tweets: A Two-Step Filtering Approach

  • M. Atif QureshiEmail author
  • Arjumand Younus
  • Colm O’Riordan
  • Gabriella Pasi
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9460)


Using Twitter as an effective marketing tool has become a gold mine for companies interested in their online reputation. A quite significant research challenge related to the above issue is to disambiguate tweets with respect to company names. In fact, finding if a particular tweet is relevant or irrelevant to a company is an important task not satisfactorily solved yet; to address this issue in this paper we propose a Wikipedia-based two-step filtering algorithm. As opposed to most other methods, the proposed approach is fully automatic and does not rely on hand-coded rules. The first step is a precision-oriented pass that uses Wikipedia as an external knowledge source to extract pertinent terms and phrases from certain parts of company Wikipedia pages, and use these as weighted filters to identify tweets about a given company. The second pass expands the first to increase recall by including more terms from URLs in tweets, Twitter user profile information and hashtags. The approach is evaluated on a CLEF lab dataset, showing good performance - especially for English tweets.


Score Propagation Twitter User Proper Noun Concept Term Digital Distribution 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Amigó, E., Artiles, J., Gonzalo, J., Spina, D., Liu, B., Corujo, A.: WePS3 evaluation campaign: overview of the on-line reputation management task. In: 2nd Web People Search Evaluation Workshop (WePS 2010), CLEF 2010 Conference, Padova (2010)Google Scholar
  2. 2.
    Amigó, E., Corujo, A., Gonzalo, J., Meij, E., de Rijke, M.: Overview of RepLab 2012: evaluating online reputation managementsystems. In: CLEF (Online Working Notes/Labs/Workshop) (2012)Google Scholar
  3. 3.
    Amigo, E., Gonzalo, J., Verdejo, F.: Reliability and sensitivity: generic evaluation measures for document organization tasks. UNED, Madrid, Technical report (2012)Google Scholar
  4. 4.
    Bizer, C., Lehmann, J., Kobilarov, G., Auer, S., Becker, C., Cyganiak, R., Hellmann, S.: DBpedia - a crystallization point for the web of data. Web Semant. 7(3), 154–165 (2009)CrossRefGoogle Scholar
  5. 5.
    Bontcheva, K., Rout, D.: Making sense of social media streams through semantics: a survey. Semant. Web J. 1, 1–31 (2012)Google Scholar
  6. 6.
    Meij, E., Weerkamp, W., de Rijke, M.: Adding semantics to microblog posts. In: Proceedings of the Fifth ACM International Conference on Web Search and Data Mining, WSDM 2012, pp. 563–572. ACM, New York (2012)Google Scholar
  7. 7.
    Spina, D., Meij, E., de Rijke, M., Oghina, A., Bui, M.T., Breuss, M.: Identifying entity aspects in microblog posts. In: Proceedings of the 35th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2012, pp. 1089–1090. ACM, New York (2012)Google Scholar
  8. 8.
    Suchanek, F.M., Kasneci, G., Weikum, G.: Yago: a core of semantic knowledge. In: Proceedings of the 16th International Conference on WorldWide Web, WWW 2007, pp. 697–706. ACM, New York (2007)Google Scholar
  9. 9.
    Yerva, S.R., Miklós, Z., Aberer, K.: What have fruits to do with technology?: the case of Orange, Blackberry and Apple. In: Proceedings of the International Conference on WebIntelligence, Mining and Semantics, WIMS 2011, pp. 48:1–48:10. ACM, New York (2011)Google Scholar
  10. 10.
    Yoshida, M., Matsushima, S., Ono, S., Sato, I.: ITC-UT: tweet categorization by query categorization for on-linereputation management. In: CLEF, vol. 170 (2010)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  • M. Atif Qureshi
    • 1
    • 2
    Email author
  • Arjumand Younus
    • 1
    • 2
  • Colm O’Riordan
    • 1
  • Gabriella Pasi
    • 2
  1. 1.Computational Intelligence Research Group, Information TechnologyNational University of IrelandGalwayIreland
  2. 2.Information Retrieval Lab, Informatics, Systems and CommunicationUniversity of Milan BicoccaMilanItaly

Personalised recommendations