Abstract
Work on Part-of-Speech (POS) tagging has mainly concentrated on standardized texts for many years. However, the interest in automatic evaluation of social media texts is growing considerably. As the nature of social media texts is clearly different from standardized texts, Natural Language Processing methods need to be adapted for reliable processing. The basis for such an adaption is a reliably tagged social media text training corpus. In this paper, we introduce a new social media text corpus and evaluate different state-of-the-art POS taggers that are retrained on that corpus. In particular, the applicability of a tagger trained on a specific social media text type to other types, such as chat messages or blog comments, is studied. We show that retraining the taggers on in-domain training data increases the tagging accuracies by more than five percentage points.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Beißwenger, M.: Corpora zur computervermittelten (internetbasierten) Kommunikation. Zeitschrift für Germanistische Linguistik 35, 496–503 (2007)
Brants, S., Dipper, S., Eisenberg, P., Hansen-Schirra, S., König, E., Lezius, W., Rohrer, C., Smith, G., Uszkoreit, H.: TIGER: Linguistic Interpretation of a German Corpus. In: Research on Language & Computation, pp. 597–620 (2004)
Brants, T.: TnT – A Statistical Part-of-Speech Tagger. In: Proceedings of the 6th Applied Natural Language Processing Conference, pp. 224–231 (2000)
Brill, E.: A Simple Rule-based Part of Speech Tagger. In: Proceedings of the Third Conference on Applied Natural Language Processing, pp. 152–155 (1992)
Gadde, P., Subramaniam, L.V., Faruquie, T.A.: Adapting a WSJ Trained Part-of-Speech Tagger to Noisy Text: Preliminary Results. In: Proceedings of the 2011 Joint Workshop on Multilingual OCR and Analytics for Noisy Unstructured Text Data, pp. 5:1–5:8 (2011)
Giesbrecht, E., Evert, S.: Is Part-of-Speech Tagging a Solved Task? An Evaluation of POS Taggers for the German Web as Corpus. In: Proceedings of the Fifth Web as Corpus Workshop, pp. 27–35 (2009)
Giménez, J., Màrquez, L.: Svmtool: A General POS Tagger Generator Based on Support Vector Machines. In: Proceedings of the 4th International Conference on Language Resources and Evaluation, pp. 43–46 (2004)
Gimpel, K., Schneider, N., O’Connor, B., Das, D., Mills, D., Eisenstein, J., Heilman, M., Yogatama, D., Flanigan, J., Smith, N.A.: Part-of-speech tagging for Twitter: annotation, features, and experiments. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics, pp. 42–47 (2011)
Klein, S., Simmons, R.F.: A Computational Approach to Grammatical Coding of English Words. J. ACM 10, 334–347 (1963)
Owoputi, O., O’Connor, B., Dyer, C., Gimpel, K., Schneider, N.: Part-of-Speech Tagging for Twitter: Word Clusters and Other Advances. Technical report, School of Computer Science, Carnegie Mellon University (2012)
Schiller, A., Teufel, S., Stöckert, C., Thielen, C.: Guidelines für das Tagging deutscher Textcorpora mit STTS. University of Stuttgart (1999)
Schmid, H.: Part-of-Speech Tagging With Neural Networks. In: Proceedings of the 15th Conference on Computational Linguistics, pp. 172–176 (1994)
Schmid, H.: Improvements in Part-of-Speech Tagging With an Application to German. In: Proceedings of the ACL SIGDAT-Workshop, pp. 47–50 (1995)
Toutanova, K., Klein, D., Manning, C.D., Singer, Y.: Feature-rich Part-of-Speech Tagging With a Cyclic Dependency Network. In: Proceedings of Human Language Technology Conference, pp. 173–180 (2003)
Trevisan, B., Neunerdt, M., Jakobs, E.-M.: A multi-level annotation model for fine-grained opinion detection in German blog comments. In: Proceedings of KONVENS 2012, pp. 179–188 (2012)
Volk, M., Schneider, G.: Comparing a statistical and a rule-based tagger for German. In: Proceedings of the 4th Conference on Natural Language Processing, pp. 125–137 (1998)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Neunerdt, M., Trevisan, B., Reyer, M., Mathar, R. (2013). Part-Of-Speech Tagging for Social Media Texts. In: Gurevych, I., Biemann, C., Zesch, T. (eds) Language Processing and Knowledge in the Web. Lecture Notes in Computer Science(), vol 8105. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-40722-2_15
Download citation
DOI: https://doi.org/10.1007/978-3-642-40722-2_15
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-40721-5
Online ISBN: 978-3-642-40722-2
eBook Packages: Computer ScienceComputer Science (R0)