Relevancer: Finding and Labeling Relevant Information in Tweet Collections

  • Ali Hürriyetoǧlu
  • Christian Gudehus
  • Nelleke Oostdijk
  • Antal van den Bosch
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10047)


We introduce a tool that supports knowledge workers who want to gain insights from a tweet collection, but due to time constraints cannot go over all tweets. Our system first pre-processes, de-duplicates, and clusters the tweets. The detected clusters are presented to the expert as so-called information threads. Subsequently, based on the information thread labels provided by the expert, a classifier is trained that can be used to classify additional tweets. As a case study, the tool is evaluated on a tweet collection based on the key terms ‘genocide’ and ‘Rohingya’. The average precision and recall of the classifier on six classes is 0.83 and 0.82 respectively. At this level of performance, experts can use the tool to manage tweet collections efficiently without missing much information.


Social media analysis Event analysis Data mining Text mining Machine learning Social signal processing Decision support systems Genocide Rohingya 



This research was funded by the Dutch national research programme COMMIT. We gratefully acknowledge the contribution by Statistics Netherlands.


  1. 1.
    Borra, E., Rieder, B.: Programmed method: developing a toolset for capturing and analyzing tweets. Aslib J. Inf. Manage. 66(3), 262–278 (2014). CrossRefGoogle Scholar
  2. 2.
    Chau, D.H.: Data mining meets hci: Making sense of large graphs. Technical report, DTIC Document (2012)Google Scholar
  3. 3.
    Choudhury, M.D., Counts, S., Czerwinski, M.: Find Me the Right Content! Diversity-based sampling of social media spaces for topic-centric search. ICWSM, pp. 129–136 (2011).
  4. 4.
    Felt, M.: Social media and the social sciences: How researchers employ Big Data analytics. Big Data Soc. 3(1), 1–15 (2016). CrossRefGoogle Scholar
  5. 5.
    Gella, S., Cook, P., Baldwin, T.: One sense per tweeter... and other lexical semantic tales of twitter. In: Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics, pp. 215–220 (2014).
  6. 6.
    Gella, S., Cook, P., Han, B.: Unsupervised word usage similarity in social media texts. In: Second Joint Conference on Lexical and Computational Semantics (*SEM), Proceedings of the Main Conference and the Shared Task: Semantic Textual Similarity, vol. 1, pp. 248–253 (2013).
  7. 7.
    Lau, J.H., Cook, P., McCarthy, D., Newman, D., Baldwin, T., Computing, L.: Word sense induction for novel sense detection. In: Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics (EACL 2012), pp. 591–601 (2012)Google Scholar
  8. 8.
    Mccarthy, D., Apidianaki, M., Erk, K.: Word sense clustering and clusterability. Comput. Linguist. 42(2), 4943 (2016)MathSciNetCrossRefGoogle Scholar
  9. 9.
    Tanguy, L., Tulechki, N., Urieli, A., Hermann, E., Raynal, C.: Natural language processing for aviation safety reports: from classification to interactive analysis. Comput. Ind. 78, 80–95 (2016)CrossRefGoogle Scholar
  10. 10.
    Wang, S., Manning, C.: Baselines and Bigrams: simple, good sentiment and topic classification. In: Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics vol. 94305(1), pp. 90–94 (2012)Google Scholar
  11. 11.
    Yang, Y., Eisenstein, J.: Putting Things in Context: Community-specific embedding projections for sentiment analysis. CoRR abs/1511.0 (2015).

Copyright information

© Springer International Publishing AG 2016

Authors and Affiliations

  • Ali Hürriyetoǧlu
    • 1
    • 2
  • Christian Gudehus
    • 3
  • Nelleke Oostdijk
    • 2
  • Antal van den Bosch
    • 2
  1. 1.Statistics NetherlandsHeerlenThe Netherlands
  2. 2.Centre for Language StudiesRadboud UniversityNijmegenThe Netherlands
  3. 3.Faculty of Social ScienceRuhr University BochumBochumGermany

Personalised recommendations