Instance-Based Learning for Tweet Monitoring and Categorization

Gobeill, Julien; Gaudinat, Arnaud; Ruch, Patrick

doi:10.1007/978-3-319-24027-5_22

Instance-Based Learning for Tweet Monitoring and Categorization

Julien Gobeill^21,22,
Arnaud Gaudinat²¹ &
Patrick Ruch^21,22

Conference paper
First Online: 20 November 2015

1801 Accesses
2 Citations

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 9283))

Abstract

The CLEF RepLab 2014 Track was the occasion to investigate the robustness of instance-based learning in a complete system for tweet monitoring and categorization based. The algorithm we implemented was a k-Nearest Neighbors. Dealing with the domain (automotive or banking) and the language (English or Spanish), the experiments showed that the categorizer was not affected by the choice of representation: even with all learning tweets merged into one single Knowledge Base (KB), the observed performances were close to those with dedicated KBs. Interestingly, English training data in addition to the sparse Spanish data were useful for Spanish categorization (+14% for accuracy for automotive, +26% for banking). Yet, performances suffered from an overprediction of the most prevalent category. The algorithm showed the defects of its virtues: it was very robust, but not easy to improve. BiTeM/SIBtex tools for tweet monitoring are available within the DrugsListener Project page of the BiTeM website (http://bitem.hesge.ch/).

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Gobeill, J., Teodoro, D., Pasche, E., Ruch, P.: Report on the trec 2009 experiments: chemical IR track. In: The Eighteenth Text REtrieval Conference (2009)
Google Scholar
Gobeill, J., Pasche, E., Teodoro, D., Ruch, P.: Simple pre and post processing strategies for patent searching in CLEF intellectual property track. In: Peters, C., Di Nunzio, G.M., Kurimo, M., Mandl, T., Mostefa, D., Peñas, A., Roda, G. (eds.) CLEF 2009. LNCS, vol. 6241, pp. 444–451. Springer, Heidelberg (2010)
Google Scholar
Teodoro, D., Gobeill, J., Pasche, E., Ruch, P., Vishnyakova, D., Lovis, C.: Automatic IPC encoding and novelty tracking for effective patent mining. In: The 8th NTCIR Workshop Meeting on Evaluation of Information Access Technologies, Tokyo, Japan, pp. 309–317 (2010)
Google Scholar
Vishnyakova, D., Pasche, E., Ruch, P.: Selection of relevant articles for curation for the comparative toxicogenomic database. In: BioCreative Workshop [Internet], pp. 31–38 (2012)
Google Scholar
Cavnar, W., Trenkle, J.: N-gram-based text categorization. In: Proceedings of SDAIR-1994, 3rd Annual Symposium on Document Analysis and Information Retrieval (1994)
Google Scholar
Practical cryptography. http://practicalcryptography.com/
Ounis, I., Amati, G., Plachouras, V., He, B., Macdonald, C., Lioma, C.: Terrier: A high performance and scalable information retrieval platform. In: Proceedings of ACM SIGIR 2006 Workshop on Open Source Information Retrieval (2006)
Google Scholar
Manning, C., Schütze, H.: Foundations of Statistical Natural Language Processing. MIT Press, Cambridge (1999)
MATH Google Scholar
BiTeM website. http://bitem.hesge.ch/
Müller, H., Geissbühler, A., Ruch, P.: ImageCLEF 2004: combining image and multi-lingual search for medical image retrieval. In: Peters, C., Clough, P., Gonzalo, J., Jones, G.J., Kluck, M., Magnini, B. (eds.) CLEF 2004. LNCS, vol. 3491, pp. 718–727. Springer, Heidelberg (2005)
Chapter Google Scholar
Müller, H., Geissbühler, A., Marty, J., Lovis, C., Ruch, P.: The use of medGIFT and easyIR for imageCLEF 2005. In: Peters, C., Gey, F.C., Gonzalo, J., Müller, H., Jones, G.J., Kluck, M., Magnini, B., de Rijke, M., Giampiccolo, D. (eds.) CLEF 2005. LNCS, vol. 4022, pp. 724–732. Springer, Heidelberg (2006)
Chapter Google Scholar

Download references

Author information

Authors and Affiliations

BiTeM Group, HEG/HES-SO, University of Applied Sciences, 7 rte de Drize, 1227, Carouge, Switzerland
Julien Gobeill, Arnaud Gaudinat & Patrick Ruch
SIBtex Group, SIB Swiss Institute of Bioinformatics, 1 rue Michel-Servet, 1206, Genève, Switzerland
Julien Gobeill & Patrick Ruch

Authors

Julien Gobeill
View author publications
You can also search for this author in PubMed Google Scholar
Arnaud Gaudinat
View author publications
You can also search for this author in PubMed Google Scholar
Patrick Ruch
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Julien Gobeill .

Editor information

Editors and Affiliations

Institut de Recherche en Informatique de Toulouse, Toulouse , France
Josanne Mothe
Department of Computer Science, University of Neuchatel, Neuchâtel, Switzerland
Jacques Savoy
Faculteit der Geesteswetenschappen, Universiteit Amsterdam, Amsterdam, The Netherlands
Jaap Kamps
Institut de Recherche en Informatique de Toulouse, Toulouse, France
Karen Pinel-Sauvagnat
School of Computing, Dublin City University, Dublin, Ireland
Gareth Jones
LIA - CERI, Université d'Avignon et des Pays de Vaucluse, Avignon, France
Eric San Juan
Department of Information Engineering, University of Padua, Padua, Italy
Linda Capellato
of Information Engineering (DEI), University of Padua, Department, Padova, Italy
Nicola Ferro

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Gobeill, J., Gaudinat, A., Ruch, P. (2015). Instance-Based Learning for Tweet Monitoring and Categorization. In: Mothe, J., et al. Experimental IR Meets Multilinguality, Multimodality, and Interaction. CLEF 2015. Lecture Notes in Computer Science(), vol 9283. Springer, Cham. https://doi.org/10.1007/978-3-319-24027-5_22

Download citation

DOI: https://doi.org/10.1007/978-3-319-24027-5_22
Published: 20 November 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-24026-8
Online ISBN: 978-3-319-24027-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics