Filter Keywords and Majority Class Strategies for Company Name Disambiguation in Twitter

Spina, Damiano; Amigó, Enrique; Gonzalo, Julio

doi:10.1007/978-3-642-23708-9_7

Damiano Spina²¹,
Enrique Amigó²¹ &
Julio Gonzalo²¹

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 6941))

Included in the following conference series:

International Conference of the Cross-Language Evaluation Forum for European Languages

664 Accesses
3 Citations

Abstract

Monitoring the online reputation of a company starts by retrieving all (fresh) information where the company is mentioned; and a major problem in this context is that company names are often ambiguous (apple may refer to the company, the fruit, the singer, etc.). The problem is particularly hard in microblogging, where there is little context to disambiguate: this was the task addressed in the WePS-3 CLEF lab exercise in 2010. This paper introduces a novel fingerprint representation technique to visualize and compare system results for the task. We apply this technique to the systems that originally participated in WePS-3, and then we use it to explore the usefulness of filter keywords (those whose presence in a tweet reliably signals either the positive or the negative class) and finding the majority class (whether positive or negative tweets are predominant for a given company name in a tweet stream) as signals that contribute to address the problem. Our study shows that both are key signals to solve the task, and we also find that, remarkably, the vocabulary associated to a company in the Web does not seem to match the vocabulary used in Twitter streams: even a manual extraction of filter keywords from web pages has substantially lower recall than an oracle selection of the best terms from the Twitter stream.

This research was partially supported by the Spanish Ministry of Education via a doctoral grant to the first author (AP2009-0507) and the Spanish Ministry of Science and Innovation (Holopedia Project, TIN2010-21128-C02).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 54.99; Price excludes VAT (USA)

Softcover Book: USD 69.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Amigó, E., Artiles, J., Gonzalo, J., Spina, D., Liu, B., Corujo, A.: WePS-3 Evaluation Campaign: Overview of the Online Reputation Management Task. In: CLEF 2010 Labs and Workshops Notebook Papers (2010)
Google Scholar
Cilibrasi, R.L., Vitanyi, P.M.: The google similarity distance. IEEE Transactions on Knowledge and Data Engineering (2007)
Google Scholar
García-Cumbreras, M.A., García-Vega, M., Martínez-Santiago, F., Peréa-Ortega, J.M.: SINAI at WePS-3: Online Reputation Management. In: CLEF 2010 Labs and Workshops Notebook Papers (2010)
Google Scholar
Kalmar, P.: Bootstrapping Websites for Classification of Organization Names on Twitter. In: CLEF 2010 Labs and Workshops Notebook Papers (2010)
Google Scholar
Mierswa, I., Wurst, M., Klinkenberg, R., Scholz, M., Euler, T.: YALE: Rapid prototyping for complex data mining tasks. In: SIGKDD 2006: Proceedings of the 12th International Conference on Knowledge Discovery and Data Mining (2006)
Google Scholar
Tsagkias, M., Balog, K.: The University of Amsterdam at WePS3. In: CLEF 2010 Labs and Workshops Notebook Papers (2010)
Google Scholar
Yerva, S.R., Miklós, Z., Aberer, K.: It was easy when apples and blackberries were only fruits. In: CLEF 2010 Labs and Workshops Notebook Papers (2010)
Google Scholar
Yoshida, M., Matsushima, S., Ono, S., Sato, I., Nakagawa, H.: ITC-UT: Tweet Categorization by Query Categorization for On-line Reputation Management. In: CLEF 2010 Labs and Workshops Notebook Papers (2010)
Google Scholar

Download references

Author information

Authors and Affiliations

UNED NLP & IR Group, Juan del Rosal, 16, 28040, Madrid, Spain
Damiano Spina, Enrique Amigó & Julio Gonzalo

Authors

Damiano Spina
View author publications
You can also search for this author in PubMed Google Scholar
Enrique Amigó
View author publications
You can also search for this author in PubMed Google Scholar
Julio Gonzalo
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Center for the Evaluation of Language and Communication Technologies (CELCT), Via alla Casata 56/c, 38123, Povo, Italy
Pamela Forner
National University of Distance Education, E.T.S.I. Informática de la UNED, c/Juan del Rosal 16, 28040, Madrid, Spain
Julio Gonzalo
School of Information Sciences, University of Tampere, Kanslerinrinne 1, 33014, Tampere, Finland
Jaana Kekäläinen
Yahoo! Research, Avinguda Diagonal 177, 8th Floor, 08018, Barcelona, Spain
Mounia Lalmas
Intelligent Systems Laboratory, University of Amsterdam, Science Park 107, 1098 XG, Amsterdam, The Netherlands
Marteen de Rijke

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Spina, D., Amigó, E., Gonzalo, J. (2011). Filter Keywords and Majority Class Strategies for Company Name Disambiguation in Twitter. In: Forner, P., Gonzalo, J., Kekäläinen, J., Lalmas, M., de Rijke, M. (eds) Multilingual and Multimodal Information Access Evaluation. CLEF 2011. Lecture Notes in Computer Science, vol 6941. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-23708-9_7

Download citation

DOI: https://doi.org/10.1007/978-3-642-23708-9_7
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-23707-2
Online ISBN: 978-3-642-23708-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics