Skip to main content

TWORPUS – An Easy-to-Use Tool for the Creation of Tailored Twitter Corpora

  • Conference paper

Part of the Lecture Notes in Computer Science book series (LNAI,volume 8105)

Abstract

In this paper we present Tworpus, an easy-to-use tool for the creation of tailored Twitter corpora. Tworpus allows scholars to create corpora without having to know about the Twitter Application Programming Interface (API) and related technical aspects. At the same time our tool complies with Twitter’s ”rules of the road” on how to use tweet data. Corpora may be composed in various sizes and for specific scenarios, as the Tworpus interface provides controls for filtering and gathering customized collections of tweets, which may serve as the basis for subsequent analyses.

Keywords

  • Twitter API
  • web corpora
  • social media corpora
  • corpus tool
  • corpus creation

This is a preview of subscription content, access via your institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • DOI: 10.1007/978-3-642-40722-2_3
  • Chapter length: 12 pages
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
eBook
USD   39.99
Price excludes VAT (USA)
  • ISBN: 978-3-642-40722-2
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
Softcover Book
USD   54.99
Price excludes VAT (USA)

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  • Androutsopoulos, J.K.: Online-Gemeinschaften und Sprachvariation. Soziolinguistische Perspektiven auf Sprache im Internet. Zeitschrift für germanistische Linguistik 31(2), 173–197 (2004)

    Google Scholar 

  • Bae, Y., Lee, H.: Sentiment analysis of twitter audiences: Measuring the positive or negative influence of popular twitterers. Journal of the American Society for Information Science and Technology 63(12), 2521–2535 (2012)

    CrossRef  Google Scholar 

  • Beißwenger, M., Storrer, A.: Corpora of Computer-Mediated Communication. In: Lüdeling, A., Kytö, M. (eds.) Corpus Linguistics. An International Handbook, pp. 292–308. Mouton de Gruyter, Berlin (2008)

    Google Scholar 

  • Baroni, M., Bernardini, S., Ferraresi, A., Zanchetta, E.: The WaCky Wide Web: A Collection of Very Large Linguistically Processed Web-Crawled Corpora. Language Resources and Evaluation 43(3), 209–226 (2009)

    CrossRef  Google Scholar 

  • Bergsma, S., McNamee, P., Bagdouri, M., Fink, C., Wilson, T.: Language identification for creating language-specific Twitter collections. In: Proceedings of the Second Workshop on Language in Social Media, LSM 2012, pp. 65–74. Association for Computational Linguistics, Montreal (2012)

    Google Scholar 

  • Crystal, D.: How Language Works. London, Penguin (2007)

    Google Scholar 

  • Fletcher, W.H.: Corpus analysis of the world wide web. In: Chapelle, C.A. (ed.) Encyclopedia of Applied Linguistics. Wiley-Blackwell (2012)

    Google Scholar 

  • Gottron, T., Lipka, N.: A Comparison of Language Identification Approaches on Short, Query-Style Texts. In: Gurrin, C., He, Y., Kazai, G., Kruschwitz, U., Little, S., Roelleke, T., Rüger, S., van Rijsbergen, K. (eds.) ECIR 2010. LNCS, vol. 5993, pp. 611–614. Springer, Heidelberg (2010)

    CrossRef  Google Scholar 

  • Jansen, B.J., Zhang, M., Sobel, K., Chowdury, A.: Twitter power: Tweets as electronic word of mouth. Journal of the American Society for Information Science and Technology 60(11), 2169–2188 (2009)

    CrossRef  Google Scholar 

  • Kaeding, F.W.: Häufigkeitswörterbuch der deutschen Sprache. Steglitz near Berlin (1898) (self published)

    Google Scholar 

  • Kilgarriff, A., Grefenstette, G.: Introduction to the Special Issue on the Web as Corpus. Computational Linguistics 29, 333–347 (2003)

    CrossRef  MathSciNet  Google Scholar 

  • Leech, G.: New resources, or just better old ones? The Holy Grail of representativeness. In: Hundt, M., Nesselhauf, N., Biewer, C. (eds.) Corpus Linguistics and the Web, pp. 133–149. Editons Rodopi B.V, Amsterdam (2007)

    Google Scholar 

  • McCreadie, R., Soboroff, I., Lin, J., Macdonald, C., Ounis, I., McCullough, D.: On building a reusable Twitter corpus. In: Proceedings of the 35th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2012, pp. 1113–1114. ACM, New York (2012)

    CrossRef  Google Scholar 

  • Nakatani, S.: Language Detection Library for Java (website), http://code.google.com/p/language-detection (accessed April 10, 2013)

  • Moretti, F.: Graphs, Maps, Trees: Abstract Models for a Literary History. London, Verso (2007)

    Google Scholar 

  • Morstatter, F., Pfeffer, J., Liu, H., Carley, K.M.: Is the Sample Good Enough? Comparing Data from Twitter’s Streaming API with Twitter’s Firehose. In: ICWSM 2013 (2013), http://www.public.asu.edu/fmorstat/paperpdfs/icwsm2013.pdf (accessed June 3, 2013)

  • Petrović, S., Osborne, M., Lavrenko, V.: The Edinburgh Twitter corpus. In: Proceedings of the NAACL HLT 2010 Workshop on Computational Linguistics in a World of Social Media, WSA 2010, pp. 25–26. Association for Computational Linguistics, Los Angeles (2010)

    Google Scholar 

  • Russ, B.: Examining Large-Scale Regional Variation Through Online Geotagged Corpora. Presentation, 2012 Annual Meeting of the American Dialect Society, http://www.briceruss.com/ADStalk.pdf (accessed April 17, 2013)

  • Squires, L.: Enregistering internet language. Language in Society 39, 457–492 (2010), http://dx.doi.org/10.1017/S0047404510000412 (accessed June 6, 2013)

  • Yamamoto, Y.: Twitter4J. Java library for the Twitter API (website), http://twitter4j.org (accessed April 10, 2013)

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and Permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Bazo, A., Burghardt, M., Wolff, C. (2013). TWORPUS – An Easy-to-Use Tool for the Creation of Tailored Twitter Corpora. In: Gurevych, I., Biemann, C., Zesch, T. (eds) Language Processing and Knowledge in the Web. Lecture Notes in Computer Science(), vol 8105. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-40722-2_3

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-40722-2_3

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-40721-5

  • Online ISBN: 978-3-642-40722-2

  • eBook Packages: Computer ScienceComputer Science (R0)