Skip to main content

Relevancer: Finding and Labeling Relevant Information in Tweet Collections

  • Conference paper
  • First Online:
Social Informatics (SocInfo 2016)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 10047))

Included in the following conference series:


We introduce a tool that supports knowledge workers who want to gain insights from a tweet collection, but due to time constraints cannot go over all tweets. Our system first pre-processes, de-duplicates, and clusters the tweets. The detected clusters are presented to the expert as so-called information threads. Subsequently, based on the information thread labels provided by the expert, a classifier is trained that can be used to classify additional tweets. As a case study, the tool is evaluated on a tweet collection based on the key terms ‘genocide’ and ‘Rohingya’. The average precision and recall of the classifier on six classes is 0.83 and 0.82 respectively. At this level of performance, experts can use the tool to manage tweet collections efficiently without missing much information.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others


  1. 1.

    For example,,, and

  2. 2.

  3. 3.

  4. 4.

  5. 5.

  6. 6.

    We used scikit-learn v0.17.1 for all machine learning tasks in this study

  7. 7.

    The annotation is designed to be done or coordinated by a single person in our setting.

  8. 8.

    The expert may prefer to tolerate a few different tweets at the end of the group in case majority of the tweets are coherent and treat the cluster as coherent.

  9. 9.

    We note that the repetition pattern analysis is valuable in its own right. However, this information is not within the scope of the present study.


  1. Borra, E., Rieder, B.: Programmed method: developing a toolset for capturing and analyzing tweets. Aslib J. Inf. Manage. 66(3), 262–278 (2014).

    Article  Google Scholar 

  2. Chau, D.H.: Data mining meets hci: Making sense of large graphs. Technical report, DTIC Document (2012)

    Google Scholar 

  3. Choudhury, M.D., Counts, S., Czerwinski, M.: Find Me the Right Content! Diversity-based sampling of social media spaces for topic-centric search. ICWSM, pp. 129–136 (2011).

  4. Felt, M.: Social media and the social sciences: How researchers employ Big Data analytics. Big Data Soc. 3(1), 1–15 (2016).

    Article  Google Scholar 

  5. Gella, S., Cook, P., Baldwin, T.: One sense per tweeter... and other lexical semantic tales of twitter. In: Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics, pp. 215–220 (2014).

  6. Gella, S., Cook, P., Han, B.: Unsupervised word usage similarity in social media texts. In: Second Joint Conference on Lexical and Computational Semantics (*SEM), Proceedings of the Main Conference and the Shared Task: Semantic Textual Similarity, vol. 1, pp. 248–253 (2013).

  7. Lau, J.H., Cook, P., McCarthy, D., Newman, D., Baldwin, T., Computing, L.: Word sense induction for novel sense detection. In: Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics (EACL 2012), pp. 591–601 (2012)

    Google Scholar 

  8. Mccarthy, D., Apidianaki, M., Erk, K.: Word sense clustering and clusterability. Comput. Linguist. 42(2), 4943 (2016)

    Article  MathSciNet  Google Scholar 

  9. Tanguy, L., Tulechki, N., Urieli, A., Hermann, E., Raynal, C.: Natural language processing for aviation safety reports: from classification to interactive analysis. Comput. Ind. 78, 80–95 (2016)

    Article  Google Scholar 

  10. Wang, S., Manning, C.: Baselines and Bigrams: simple, good sentiment and topic classification. In: Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics vol. 94305(1), pp. 90–94 (2012)

    Google Scholar 

  11. Yang, Y., Eisenstein, J.: Putting Things in Context: Community-specific embedding projections for sentiment analysis. CoRR abs/1511.0 (2015).

Download references


This research was funded by the Dutch national research programme COMMIT. We gratefully acknowledge the contribution by Statistics Netherlands.

Author information

Authors and Affiliations


Corresponding author

Correspondence to Ali Hürriyetoǧlu .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing AG

About this paper

Cite this paper

Hürriyetoǧlu, A., Gudehus, C., Oostdijk, N., van den Bosch, A. (2016). Relevancer: Finding and Labeling Relevant Information in Tweet Collections. In: Spiro, E., Ahn, YY. (eds) Social Informatics. SocInfo 2016. Lecture Notes in Computer Science(), vol 10047. Springer, Cham.

Download citation

  • DOI:

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-47873-9

  • Online ISBN: 978-3-319-47874-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics