International Conference on Theory and Practice of Digital Libraries

Research and Advanced Technology for Digital Libraries pp 364-367 | Cite as

Extracting a Topic Specific Dataset from a Twitter Archive

  • Clare Llewellyn
  • Claire Grover
  • Beatrice Alex
  • Jon Oberlander
  • Richard Tobin
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9316)

Abstract

Datasets extracted from the microblogging service Twitter are often generated using specific query terms or hashtags. We describe how a dataset produced using the query term ‘syria’ can be increased in size to include tweets on the topic of Syria that do not contain that query term. We compare three methods for this task, using the top hashtags from the set as search terms, using a hand selected set of hashtags as search terms and using LDA topic modelling to cluster tweets and selecting appropriate clusters. We describe an evaluation method for accessing the relevance and accuracy of the tweets returned.

Keywords

Social media Topic modelling Data selection 

References

  1. 1.
    Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)MATHGoogle Scholar
  2. 2.
    McCallum, A.K.: MALLET: A machine learning for language toolkit (2002)Google Scholar
  3. 3.
    Osborne, M., Moran, S., McCreadie, R., Von Lunen, A., Sykora, M.D., Cano, E., Ireson, N., Macdonald, C., Ounis, I., He, Y., et al.: Real-time detection, tracking, and monitoring of automatically discovered events in social media (2014)Google Scholar
  4. 4.
    Soboroff, I., McCullough, D., Lin, J., Macdonald, C., Ounis, I., McCreadie, R.: Evaluating real-time search over tweets. In: Proceedings of ICWSM (2012)Google Scholar
  5. 5.
    Yang, J., Leskovec, J.: Patterns of temporal variation in online media. In: Proceedings of the Fourth ACM International Conference on Web Search and Data Mining, pp. 177–186. ACM (2011)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  • Clare Llewellyn
    • 1
  • Claire Grover
    • 1
  • Beatrice Alex
    • 1
  • Jon Oberlander
    • 1
  • Richard Tobin
    • 1
  1. 1.School of InformaticsUniversity of EdinburghEdinburghUK

Personalised recommendations