Relevancer: Finding and Labeling Relevant Information in Tweet Collections
We introduce a tool that supports knowledge workers who want to gain insights from a tweet collection, but due to time constraints cannot go over all tweets. Our system first pre-processes, de-duplicates, and clusters the tweets. The detected clusters are presented to the expert as so-called information threads. Subsequently, based on the information thread labels provided by the expert, a classifier is trained that can be used to classify additional tweets. As a case study, the tool is evaluated on a tweet collection based on the key terms ‘genocide’ and ‘Rohingya’. The average precision and recall of the classifier on six classes is 0.83 and 0.82 respectively. At this level of performance, experts can use the tool to manage tweet collections efficiently without missing much information.
KeywordsSocial media analysis Event analysis Data mining Text mining Machine learning Social signal processing Decision support systems Genocide Rohingya
This research was funded by the Dutch national research programme COMMIT. We gratefully acknowledge the contribution by Statistics Netherlands.
- 2.Chau, D.H.: Data mining meets hci: Making sense of large graphs. Technical report, DTIC Document (2012)Google Scholar
- 3.Choudhury, M.D., Counts, S., Czerwinski, M.: Find Me the Right Content! Diversity-based sampling of social media spaces for topic-centric search. ICWSM, pp. 129–136 (2011). http://www.aaai.org/ocs/index.php/ICWSM/ICWSM11/paper/viewPDFInterstitial/2792/3290
- 4.Felt, M.: Social media and the social sciences: How researchers employ Big Data analytics. Big Data Soc. 3(1), 1–15 (2016). http://bds.sagepub.com/lookup/doi/10.1177/2053951716645828 CrossRefGoogle Scholar
- 5.Gella, S., Cook, P., Baldwin, T.: One sense per tweeter... and other lexical semantic tales of twitter. In: Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics, pp. 215–220 (2014). http://www.aclweb.org/anthology/E14-4042
- 6.Gella, S., Cook, P., Han, B.: Unsupervised word usage similarity in social media texts. In: Second Joint Conference on Lexical and Computational Semantics (*SEM), Proceedings of the Main Conference and the Shared Task: Semantic Textual Similarity, vol. 1, pp. 248–253 (2013). http://www.aclweb.org/anthology/S13-1036
- 7.Lau, J.H., Cook, P., McCarthy, D., Newman, D., Baldwin, T., Computing, L.: Word sense induction for novel sense detection. In: Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics (EACL 2012), pp. 591–601 (2012)Google Scholar
- 10.Wang, S., Manning, C.: Baselines and Bigrams: simple, good sentiment and topic classification. In: Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics vol. 94305(1), pp. 90–94 (2012)Google Scholar
- 11.Yang, Y., Eisenstein, J.: Putting Things in Context: Community-specific embedding projections for sentiment analysis. CoRR abs/1511.0 (2015). http://arxiv.org/abs/1511.06052