Bidirectional LSTM Tagger for Latvian Grammatical Error Detection

  • Daiga DeksneEmail author
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11697)


This paper reports on the development of a grammar error labeling system for the Latvian language. We choose to label six error types that are crucial for understanding a text as noted in a survey by native Latvian speakers. The error types are the following: an incorrect use of a preposition, an incorrect agreement in a phrase, an incorrect verb form, an incorrect noun form, an incorrect choice of the definite/indefinite ending of an adjective, and a missing comma. For neural network model training, a large amount of error-annotated training data is required. We generate artificial errors in a correct text to cope with the lack of manually annotated data. As a bidirectional Long Short-Term Memory neural network algorithm is considered the best for erroneous word detection by several authors, we chose this architecture. We train several models – models labeling a single type of error and models labeling all six types of errors. The precision for all types of errors reaches 94.61%, the recall – 94.08%.


Grammar errors Neural network Word embeddings 



The research has been supported by the European Regional Development Fund within the project “Neural Network Modelling for Inflected Natural Languages” No.


  1. 1.
    Bojanowski, P., Grave, E., Joulin, A., Mikolov, T.: Enriching word vectors with subword information. Trans. Assoc. Comput. Linguist. 5(1), 135–146 (2017)CrossRefGoogle Scholar
  2. 2.
    Chollampatt, S., Ng, H.T.: A multilayer convolutional encoder-decoder neural network for grammatical error correction. In: Thirty-Second AAAI Conference on Artificial Intelligence (2018)Google Scholar
  3. 3.
    Dale, R.: Checking in on grammar checking. Nat. Lang. Eng. 22(03), 491–495 (2016)CrossRefGoogle Scholar
  4. 4.
    Darǵis, R., Auziņa, I., Levāne-Petrova, K.: The use of text alignment in semi-automatic error analysis: use case in the development of the corpus of the Latvian language learners. In: Proceedings of the 11th International Conference on Language Resources and Evaluation (LREC), pp. 4111–4115 (2018)Google Scholar
  5. 5.
    Deksne, D., Skadina, I.: Error-annotated corpus of Latvian. In: Utka, A., et al. (eds.) Human Language Technologies - The Baltic Perspective. Proceedings of the sixth International Conference Baltic HLT 2014, FAIA, vol. 268, pp. 163–166. IOS Press, Amsterdam (2014)Google Scholar
  6. 6.
    Deksne, D.: A new phase in the development of a grammar checker for Latvian. In: Skadiņa, I., Rozis, R. (eds.) Human Language Technologies - The Baltic Perspective. Proceedings of the seventh International Conference Baltic HLT 2016, FAIA, vol. 289, pp. 147–152. IOS Press, Amsterdam (2016)Google Scholar
  7. 7.
    Devlin, J.; Chang, M.W.; Lee, K.; Toutanova, K.: Bert: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)
  8. 8.
    Ge, T., Wei, F., Zhou, M.: Fluency boost learning and inference for neural grammatical error correction. In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 1055–1065 (2018)Google Scholar
  9. 9.
    Ghosh, S., Kristensson, P.O.: Neural networks for text correction and completion in keyboard decoding. arXiv preprint arXiv:1709.06429 (2017)
  10. 10.
    Han, N.R., Chodorow, M., Leacock, C.: Detecting errors in English article usage by non-native speakers. Nat. Lang. Eng. 12(2), 115–129 (2006)CrossRefGoogle Scholar
  11. 11.
    Junczys-Dowmunt, M., Grundkiewicz, R., Guha, S., Heafield, K.: Approaching neural grammatical error correction as a low-resource machine translation task. In: Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), pp. 595–606 (2018)Google Scholar
  12. 12.
    Kaneko, M., Sakaizawa, Y., Komachi, M.: Grammatical error detection using error-and grammaticality-specific word embeddings. In: Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pp. 40–48 (2017)Google Scholar
  13. 13.
    Liu, Z.R., Liu, Y.: Exploiting unlabeled data for neural grammatical error detection. J. Comput. Sci. Technol. 32(4), 758–767 (2017)MathSciNetCrossRefGoogle Scholar
  14. 14.
    Ng, H.T., Wu, S.M., Briscoe, T., Hadiwinoto, C., Susanto, R.H., Bryant, C.: The CoNLL-2014 shared task on grammatical error correction. In: CoNLL Shared Task, pp. 1–14 (2014)Google Scholar
  15. 15.
    Rei, M., Felice, M., Yuan, Z., Briscoe, T.: Artificial error generation with machine translation and syntactic patterns. In: Proceedings of the 12th Workshop on Innovative Use of NLP for Building Educational Applications, pp. 287–292. ACL, Copenhagen (2017)Google Scholar
  16. 16.
    Rei, M., Yannakoudakis., H.: Compositional sequence labeling models for error detection in learner writing. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, pp. 1181–1191. ACL, Berlin (2016)Google Scholar
  17. 17.
    Rei, M., Yannakoudakis, H.: Auxiliary objectives for neural error detection models. In: Proceedings of the 12th Workshop on Innovative Use of NLP for Building Educational Applications, pp. 33–43. ACL, Copenhagen (2017)Google Scholar
  18. 18.
    Sakaguchi, K., Napoles, C., Tetreault, J.: GEC into the future: where are we going and how do we get there? In: Proceedings of the 12th Workshop on Innovative Use of NLP for Building Educational Applications, pp. 180–187. ACL, Copenhagen (2017)Google Scholar
  19. 19.
    Schmaltz, A., Kim, Y., Rush, A. and Shieber, S.: Adapting sequence models for sentence correction. In: Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pp. 2807–2813. ACL, Copenhagen (2017)Google Scholar
  20. 20.
    Sun, C., Jin, X., Lin, L., Zhao, Y., Wang, X.: Convolutional neural networks for correcting English article errors. In: Li, J., Ji, H., Zhao, D., Feng, Y. (eds.) National CCF Conference on Natural Language Processing and Chinese Computing. LNCS, vol. 9362, pp. 102–110. Springer, Cham (2015). Scholar
  21. 21.
    Šķilters, J., Zariņa, L., Žilinskaitė-Šinkūnienė, E., Skolmeistere, V.: Acceptability rating of ungrammatical colloquial Latvian: how native speakers judge different error types. Baltic J. Mod. Comput. 6(2), 173–194 (2018) CrossRefGoogle Scholar
  22. 22.
    Tiedemann, J.: News from OPUS - a collection of multilingual parallel corpora with tools and interfaces. In: Nicolov, N., Angelova, G., Mitkov, R. (eds.) Recent Advances in Natural Language Processing V. Selected papers from RANLP 2007, pp. 237–248. John Benjamins Publishing Company, Amsterdam/Philadelphia (2009)Google Scholar
  23. 23.
    Znotiņa, I.: Computer-aided error analysis for researching baltic interlanguage. Rural Environment, Education, Personality (REEP). In: Proceedings of the tenth International Scientific Conference, pp. 238–244. LLU, Jelgava (2017)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.TildeRigaLatvia

Personalised recommendations