On Morphological Analysis for Learner Language, Focusing on Russian

  • Markus DickinsonEmail author


We describe a framework for performing morphological analysis to account for learner language, focusing on Russian as an example of an inflecting language. Because a set of linguistic analyses is needed to provide feedback on potentially noisy data, there is a large amount of ambiguity for even well-formed words. Using a segmented POS lexicon as a test case, we show how to analyze subparts of words, in order to analyze variations. After describing and implementing this framework for Russian, we focus on removing undesirable analyses to keep the task feasible. This is essentially an investigation of how much overgeneration of analyses is a problem and under what assumptions it can be reduced.


Learner language Russian Morphological analysis 


  1. Amaral L., Meurers D. (2007) Conceptualizing Student Models for ICALL. In: Conati C., McCoy K.F. (eds) User Modeling 2007: Proceedings of the eleventh international conference. Springer, Lecture Notes in Computer Science, Springer Wien, New York, BerlinGoogle Scholar
  2. Beesley K. R., Karttunen L. (2003) Finite state morphology. CSLI Publications, StanfordGoogle Scholar
  3. Chew, P. A., Bader, B. W., & Abdelali, A. (2008). Latent morpho-semantic analysis: Multilingual information retrieval with character N-grams and mutual information. In Proceedings of the 22nd international conference on computational linguistics (Coling 2008) (pp. 129–136). Manchester, UK: Coling 2008 Organizing Committee.Google Scholar
  4. Díaz-Negrillo, A., Meurers, D., Valera, S., & Wunsch, H. (2010, to appear). Towards interlanguage POS annotation for effective learner corpora in SLA and FLT. Language Forum Special Issue on New Trends in Language Teaching.Google Scholar
  5. Dickinson, M. (2010). Generating learner-like morphological errors in Russian. In Proceedings of the 23nd international conference on computational linguistics (COLING-10). Beijing, China.Google Scholar
  6. Dickinson, M., & Herring, J. (2008). Developing online ICALL exercises for Russian. In The 3rd workshop on innovative use of NLP for building educational applications (pp. 1–9). Columbus, OH.Google Scholar
  7. Evans, R., Tiberius, C., Brown, D., & Corbett, G. (2003a). Russian Lemmatisation with DATR. Tech. rep., University of Brighton, Brighton. Information Technology Research Institute Technical Report Series, ITRI-03-23.Google Scholar
  8. Evans, R., Tiberius, C., Brown, D., & Corbett, G. C. (2003b). A large-scale inheritance-based morphological lexicon for Russian. In Proceedings of the EACL 2003 workshop on morphological processing of slavic languages (pp. 9–16). Budapest.Google Scholar
  9. Feldman A., Hana J. (2010) A resource-light approach to morpho-syntactic tagging. Rodopi, AmsterdamGoogle Scholar
  10. Felshin, S. (1995). The athena language learning project NLP system: A multilingual system for conversation-based language learning. In: Intelligent language tutors: Theory shaping technology (Chap. 14, pp. 257–272). Lawrence Erlbaum Associates.Google Scholar
  11. Foster J., Vogel C. (2004) Parsing ill-formed text using an error grammar. Artificial Intelligence Review 21: 269–291 (Special AICS 2003 Issue)CrossRefGoogle Scholar
  12. Gelbukh, A., & Sidorov, G. (2003). Approach to construction of automatic morphological analysis systems for inflective languages with little effort. In Proceedings of the fourth international conference on intelligent text processing and computational linguistics (CICLing-03), Lecture Notes in Computer Science (Vol. 2588, pp. 215–220). Springer.Google Scholar
  13. Heift T. (2003) Multiple learner errors and meaningful feedback: A challenge for ICALL systems. CALICO Journal 20(3): 533–548Google Scholar
  14. Heift T., Schulze M. (2007) Errors and intelligence in computer-assisted language learning: Parsers and pedagogues. Routledge, New YorkGoogle Scholar
  15. Karttunen, L., Kaplan, R. M. & Zaenen, A. (1992). Two-level morphology with composition. In Proceedings of the 14th international conference on computational linguistics (COLING-92) (pp. 141–148).Google Scholar
  16. Koskenniemi, K. (1983). Two-level morphology: A general computational model for word-form recognition and production. Ph.D. thesis, University of Helsinki.Google Scholar
  17. Loritz D. (1992) Generalized transition network parsing for language study: The GPARS system for English, Russian, Japanese and Chinese. CALICO Journal 10(1): 5–22Google Scholar
  18. Menzel W. (2006) Detecting mistakes or finding misconceptions? Diagnosing morpho-syntactic errors in language learning. In: Angelova G., Simov K., Slavcheva M. (eds) Readings in multilinguality. Incoma Ltd, Shoumen, pp 71–77Google Scholar
  19. Menzel, W., & Schröder, I. (1999). Error diagnosis for language learning systems. Specifal edition of the ReCALL Journal pp. 20–30.Google Scholar
  20. Mikheev A. (1997) Automatic rule induction for unknown-word guessing. Computational Linguistics 23(3): 405–423Google Scholar
  21. Oflazer K. (1996) Error-tolerant finite-state recognition with applications to morphological analysis and spelling correction. Computational Linguistics 22(1): 73–89Google Scholar
  22. Roark B., Sproat R. (2007) Computational approaches to morphology and syntax. Oxford University Press, OxfordGoogle Scholar
  23. Rosengrant S. F. (1987) Error patterns in written Russian. The Modern Language Journal 71(2): 138–145CrossRefGoogle Scholar
  24. Rozovskaya, A., & Roth, D. (2010). Annotating ESL errors: Challenges and rewards. In Proceedings of the NAACL HLT 2010 fifth workshop on innovative use of NLP for building educational applications (pp. 28–36). Los Angeles, California: Association for Computational Linguistics.Google Scholar
  25. Rubinstein G. (1995) On case errors made in oral speech by American learners of Russian. Slavic and East European Journal 39(3): 408–429CrossRefGoogle Scholar
  26. Schmid, H. (2005). A programming language for finite state transducers. In Proceedings of the 5th international workshop on finite state methods in natural language processing (FSMNLP 2005). Helsinki, Finland.Google Scholar
  27. Schneider, D., & McCoy, K. F. (1998). Recognizing syntactic errors in the writing of second language learners. In Proceedings of the 36th annual meeting of the association for computational linguistics and 17th international conference on computational linguistics (Vol. 2, pp. 1198–1204). Montreal, Quebec, Canada: Association for Computational Linguistics.Google Scholar
  28. Segalovich I. (2003) A fast morphological algorithm with unknown word guessing induced by a dictionary for a web search engine. In: Arabnia H.R., Kozerenko E.B (eds) Proceedings of the international conference on machine learning; models, technologies and applications (MLMTA’03). CSREA Press, Las Vegas, pp 273–280Google Scholar
  29. Sharoff, S., Kopotev, M., Erjavec, T., Feldman, A., & Divjak, D. (2008). Designing and evaluating Russian tagsets. In Proceedings of the 6th international language resources and evaluation conference (LREC-08). Marrakech, Morocco.Google Scholar
  30. Tetreault, J., & Chodorow, M. (2008). Native judgments of non-native usage: Experiments in preposition error detection. In Coling 2008: Proceedings of the workshop on human judgements in computational linguistics (pp. 24–32). Manchester, UK: Coling 2008 Organizing Committee.Google Scholar
  31. Townsend, C. E. (1975). Russian word formation. Bloomington, IN: Slavica Publishers, Inc.Google Scholar
  32. Vandeventer Faltin, A. (2003). Syntactic error diagnosis in the context of computer assisted language learning. Thèse de doctorat, Université de Genève, Genève.Google Scholar
  33. Yablonsky, S. A. (1999). Russian morphological analysis. In Proceedings of Venezia per il Trattamento Automatico delle Lingue (VEXTAL’99) (pp. 83–90). Venezia.Google Scholar

Copyright information

© Springer Science+Business Media B.V. 2011

Authors and Affiliations

  1. 1.Department of LinguisticsIndiana UniversityBloomingtonUSA

Personalised recommendations