Skip to main content

Part-Of-Speech Tagging for Social Media Texts

  • Conference paper
Language Processing and Knowledge in the Web

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8105))

Abstract

Work on Part-of-Speech (POS) tagging has mainly concentrated on standardized texts for many years. However, the interest in automatic evaluation of social media texts is growing considerably. As the nature of social media texts is clearly different from standardized texts, Natural Language Processing methods need to be adapted for reliable processing. The basis for such an adaption is a reliably tagged social media text training corpus. In this paper, we introduce a new social media text corpus and evaluate different state-of-the-art POS taggers that are retrained on that corpus. In particular, the applicability of a tagger trained on a specific social media text type to other types, such as chat messages or blog comments, is studied. We show that retraining the taggers on in-domain training data increases the tagging accuracies by more than five percentage points.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 49.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Beißwenger, M.: Corpora zur computervermittelten (internetbasierten) Kommunikation. Zeitschrift für Germanistische Linguistik 35, 496–503 (2007)

    Article  Google Scholar 

  2. Brants, S., Dipper, S., Eisenberg, P., Hansen-Schirra, S., König, E., Lezius, W., Rohrer, C., Smith, G., Uszkoreit, H.: TIGER: Linguistic Interpretation of a German Corpus. In: Research on Language & Computation, pp. 597–620 (2004)

    Google Scholar 

  3. Brants, T.: TnT – A Statistical Part-of-Speech Tagger. In: Proceedings of the 6th Applied Natural Language Processing Conference, pp. 224–231 (2000)

    Google Scholar 

  4. Brill, E.: A Simple Rule-based Part of Speech Tagger. In: Proceedings of the Third Conference on Applied Natural Language Processing, pp. 152–155 (1992)

    Google Scholar 

  5. Gadde, P., Subramaniam, L.V., Faruquie, T.A.: Adapting a WSJ Trained Part-of-Speech Tagger to Noisy Text: Preliminary Results. In: Proceedings of the 2011 Joint Workshop on Multilingual OCR and Analytics for Noisy Unstructured Text Data, pp. 5:1–5:8 (2011)

    Google Scholar 

  6. Giesbrecht, E., Evert, S.: Is Part-of-Speech Tagging a Solved Task? An Evaluation of POS Taggers for the German Web as Corpus. In: Proceedings of the Fifth Web as Corpus Workshop, pp. 27–35 (2009)

    Google Scholar 

  7. Giménez, J., Màrquez, L.: Svmtool: A General POS Tagger Generator Based on Support Vector Machines. In: Proceedings of the 4th International Conference on Language Resources and Evaluation, pp. 43–46 (2004)

    Google Scholar 

  8. Gimpel, K., Schneider, N., O’Connor, B., Das, D., Mills, D., Eisenstein, J., Heilman, M., Yogatama, D., Flanigan, J., Smith, N.A.: Part-of-speech tagging for Twitter: annotation, features, and experiments. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics, pp. 42–47 (2011)

    Google Scholar 

  9. Klein, S., Simmons, R.F.: A Computational Approach to Grammatical Coding of English Words. J. ACM 10, 334–347 (1963)

    Article  MATH  Google Scholar 

  10. Owoputi, O., O’Connor, B., Dyer, C., Gimpel, K., Schneider, N.: Part-of-Speech Tagging for Twitter: Word Clusters and Other Advances. Technical report, School of Computer Science, Carnegie Mellon University (2012)

    Google Scholar 

  11. Schiller, A., Teufel, S., Stöckert, C., Thielen, C.: Guidelines für das Tagging deutscher Textcorpora mit STTS. University of Stuttgart (1999)

    Google Scholar 

  12. Schmid, H.: Part-of-Speech Tagging With Neural Networks. In: Proceedings of the 15th Conference on Computational Linguistics, pp. 172–176 (1994)

    Google Scholar 

  13. Schmid, H.: Improvements in Part-of-Speech Tagging With an Application to German. In: Proceedings of the ACL SIGDAT-Workshop, pp. 47–50 (1995)

    Google Scholar 

  14. Toutanova, K., Klein, D., Manning, C.D., Singer, Y.: Feature-rich Part-of-Speech Tagging With a Cyclic Dependency Network. In: Proceedings of Human Language Technology Conference, pp. 173–180 (2003)

    Google Scholar 

  15. Trevisan, B., Neunerdt, M., Jakobs, E.-M.: A multi-level annotation model for fine-grained opinion detection in German blog comments. In: Proceedings of KONVENS 2012, pp. 179–188 (2012)

    Google Scholar 

  16. Volk, M., Schneider, G.: Comparing a statistical and a rule-based tagger for German. In: Proceedings of the 4th Conference on Natural Language Processing, pp. 125–137 (1998)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Neunerdt, M., Trevisan, B., Reyer, M., Mathar, R. (2013). Part-Of-Speech Tagging for Social Media Texts. In: Gurevych, I., Biemann, C., Zesch, T. (eds) Language Processing and Knowledge in the Web. Lecture Notes in Computer Science(), vol 8105. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-40722-2_15

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-40722-2_15

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-40721-5

  • Online ISBN: 978-3-642-40722-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics