Decision Tree-Based Evaluation of Genitive Classification – An Empirical Study on CMC and Text Corpora

  • Sandra Hansen
  • Roman Schneider
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8105)

Abstract

Contemporary studies on the characteristics of natural language benefit enormously from the increasing amount of linguistic corpora. Aside from text and speech corpora, corpora of computer-mediated communication (CMC) position themselves between orality and literacy, and beyond that provide insight into the impact of “new”, mainly internet-based media on language behaviour. In this paper, we present an empirical attempt to work with annotated CMC corpora for the explanation of linguistic phenomena. In concrete terms, we implement machine learning algorithms to produce decision trees that reveal rules and tendencies about the use of genitive markers in German.

Keywords

Corpus Linguistics Computer-Mediated Communication Machine Learning Decision Trees Grammar Genitive Classification 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Baayen, R.H., Piepenbrock, R., Gulikers, L.: The CELEX Lexical Database (CD- ROM), Philadelphia (1995)Google Scholar
  2. 2.
    Beißwenger, M., Storrer, A.: Corpora of Computer-Mediated Communication. In: Lüdeling, A., Kytö, M. (eds.) Corpus Linguistics, vol. 1, pp. 292–308. de Gruyter, Berlin (2008)Google Scholar
  3. 3.
    Cohen, A.: On the graphical display of the significant components in a two-way contingency table. In: Communications in Statistics - Theory and Methods, vol. A9, pp. 1025–1041 (1980)Google Scholar
  4. 4.
    Friendly, M.: Graphical methods for categorical data. In: SAS User Group Int. Conference Proc., vol. 17, pp. 190–200 (1992), http://www.math.yorku.ca/SCS/sugi/sugi17-paper.html
  5. 5.
    Herring, S.: Computer-Mediated Conversation. Language@Internet 7/8 (2010/2011), http://www.languageatinternet.org
  6. 6.
    Meyer, D., Zeileis, A., Hornik, K.: The strucplot framework: Visualizing multi-way contingency tables with vcd. Report 22, Department of Statistics and Mathematics, Wirtschaftsuniversität Wien, Research Report Series (2005)Google Scholar
  7. 7.
    Ogura, K., Nishimoto, K.: Is a Face-to-Face Conversation Model Applicable to Chat Conversations? In: Proc. PRICAI Workshop Language Sense on Computer, pp. 26–31 (2004)Google Scholar
  8. 8.
    Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann, San Francisco (1993)Google Scholar
  9. 9.
    Schneider, R.: Evaluating DBMS-Based Access Strategies to Very Large Multi-Layer Annotated Corpora. In: Proceedings of the LREC-2012 Workshop on Challenges in the Management of Large Corpora, Istanbul (2012)Google Scholar
  10. 10.
    Witten, I., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques. Morgan Kaufmann, San Francisco (2005)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Sandra Hansen
    • 1
  • Roman Schneider
    • 1
  1. 1.Institute for German Language (IDS)MannheimGermany

Personalised recommendations