Skip to main content

Bidirectional LSTM Tagger for Latvian Grammatical Error Detection

Part of the Lecture Notes in Computer Science book series (LNAI,volume 11697)

Abstract

This paper reports on the development of a grammar error labeling system for the Latvian language. We choose to label six error types that are crucial for understanding a text as noted in a survey by native Latvian speakers. The error types are the following: an incorrect use of a preposition, an incorrect agreement in a phrase, an incorrect verb form, an incorrect noun form, an incorrect choice of the definite/indefinite ending of an adjective, and a missing comma. For neural network model training, a large amount of error-annotated training data is required. We generate artificial errors in a correct text to cope with the lack of manually annotated data. As a bidirectional Long Short-Term Memory neural network algorithm is considered the best for erroneous word detection by several authors, we chose this architecture. We train several models – models labeling a single type of error and models labeling all six types of errors. The precision for all types of errors reaches 94.61%, the recall – 94.08%.

Keywords

  • Grammar errors
  • Neural network
  • Word embeddings

This is a preview of subscription content, access via your institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • DOI: 10.1007/978-3-030-27947-9_5
  • Chapter length: 11 pages
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
eBook
USD   59.99
Price excludes VAT (USA)
  • ISBN: 978-3-030-27947-9
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
Softcover Book
USD   79.99
Price excludes VAT (USA)
Fig. 1.
Fig. 2.

References

  1. Bojanowski, P., Grave, E., Joulin, A., Mikolov, T.: Enriching word vectors with subword information. Trans. Assoc. Comput. Linguist. 5(1), 135–146 (2017)

    CrossRef  Google Scholar 

  2. Chollampatt, S., Ng, H.T.: A multilayer convolutional encoder-decoder neural network for grammatical error correction. In: Thirty-Second AAAI Conference on Artificial Intelligence (2018)

    Google Scholar 

  3. Dale, R.: Checking in on grammar checking. Nat. Lang. Eng. 22(03), 491–495 (2016)

    CrossRef  Google Scholar 

  4. Darǵis, R., Auziņa, I., Levāne-Petrova, K.: The use of text alignment in semi-automatic error analysis: use case in the development of the corpus of the Latvian language learners. In: Proceedings of the 11th International Conference on Language Resources and Evaluation (LREC), pp. 4111–4115 (2018)

    Google Scholar 

  5. Deksne, D., Skadina, I.: Error-annotated corpus of Latvian. In: Utka, A., et al. (eds.) Human Language Technologies - The Baltic Perspective. Proceedings of the sixth International Conference Baltic HLT 2014, FAIA, vol. 268, pp. 163–166. IOS Press, Amsterdam (2014)

    Google Scholar 

  6. Deksne, D.: A new phase in the development of a grammar checker for Latvian. In: Skadiņa, I., Rozis, R. (eds.) Human Language Technologies - The Baltic Perspective. Proceedings of the seventh International Conference Baltic HLT 2016, FAIA, vol. 289, pp. 147–152. IOS Press, Amsterdam (2016)

    Google Scholar 

  7. Devlin, J.; Chang, M.W.; Lee, K.; Toutanova, K.: Bert: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)

  8. Ge, T., Wei, F., Zhou, M.: Fluency boost learning and inference for neural grammatical error correction. In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 1055–1065 (2018)

    Google Scholar 

  9. Ghosh, S., Kristensson, P.O.: Neural networks for text correction and completion in keyboard decoding. arXiv preprint arXiv:1709.06429 (2017)

  10. Han, N.R., Chodorow, M., Leacock, C.: Detecting errors in English article usage by non-native speakers. Nat. Lang. Eng. 12(2), 115–129 (2006)

    CrossRef  Google Scholar 

  11. Junczys-Dowmunt, M., Grundkiewicz, R., Guha, S., Heafield, K.: Approaching neural grammatical error correction as a low-resource machine translation task. In: Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), pp. 595–606 (2018)

    Google Scholar 

  12. Kaneko, M., Sakaizawa, Y., Komachi, M.: Grammatical error detection using error-and grammaticality-specific word embeddings. In: Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pp. 40–48 (2017)

    Google Scholar 

  13. Liu, Z.R., Liu, Y.: Exploiting unlabeled data for neural grammatical error detection. J. Comput. Sci. Technol. 32(4), 758–767 (2017)

    MathSciNet  CrossRef  Google Scholar 

  14. Ng, H.T., Wu, S.M., Briscoe, T., Hadiwinoto, C., Susanto, R.H., Bryant, C.: The CoNLL-2014 shared task on grammatical error correction. In: CoNLL Shared Task, pp. 1–14 (2014)

    Google Scholar 

  15. Rei, M., Felice, M., Yuan, Z., Briscoe, T.: Artificial error generation with machine translation and syntactic patterns. In: Proceedings of the 12th Workshop on Innovative Use of NLP for Building Educational Applications, pp. 287–292. ACL, Copenhagen (2017)

    Google Scholar 

  16. Rei, M., Yannakoudakis., H.: Compositional sequence labeling models for error detection in learner writing. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, pp. 1181–1191. ACL, Berlin (2016)

    Google Scholar 

  17. Rei, M., Yannakoudakis, H.: Auxiliary objectives for neural error detection models. In: Proceedings of the 12th Workshop on Innovative Use of NLP for Building Educational Applications, pp. 33–43. ACL, Copenhagen (2017)

    Google Scholar 

  18. Sakaguchi, K., Napoles, C., Tetreault, J.: GEC into the future: where are we going and how do we get there? In: Proceedings of the 12th Workshop on Innovative Use of NLP for Building Educational Applications, pp. 180–187. ACL, Copenhagen (2017)

    Google Scholar 

  19. Schmaltz, A., Kim, Y., Rush, A. and Shieber, S.: Adapting sequence models for sentence correction. In: Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pp. 2807–2813. ACL, Copenhagen (2017)

    Google Scholar 

  20. Sun, C., Jin, X., Lin, L., Zhao, Y., Wang, X.: Convolutional neural networks for correcting English article errors. In: Li, J., Ji, H., Zhao, D., Feng, Y. (eds.) National CCF Conference on Natural Language Processing and Chinese Computing. LNCS, vol. 9362, pp. 102–110. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-25207-0_9

    CrossRef  Google Scholar 

  21. Šķilters, J., Zariņa, L., Žilinskaitė-Šinkūnienė, E., Skolmeistere, V.: Acceptability rating of ungrammatical colloquial Latvian: how native speakers judge different error types. Baltic J. Mod. Comput. 6(2), 173–194 (2018)

    CrossRef  Google Scholar 

  22. Tiedemann, J.: News from OPUS - a collection of multilingual parallel corpora with tools and interfaces. In: Nicolov, N., Angelova, G., Mitkov, R. (eds.) Recent Advances in Natural Language Processing V. Selected papers from RANLP 2007, pp. 237–248. John Benjamins Publishing Company, Amsterdam/Philadelphia (2009)

    Google Scholar 

  23. Znotiņa, I.: Computer-aided error analysis for researching baltic interlanguage. Rural Environment, Education, Personality (REEP). In: Proceedings of the tenth International Scientific Conference, pp. 238–244. LLU, Jelgava (2017)

    Google Scholar 

Download references

Acknowledgment

The research has been supported by the European Regional Development Fund within the project “Neural Network Modelling for Inflected Natural Languages” No. 1.1.1.1/16/A/215.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Daiga Deksne .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and Permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Verify currency and authenticity via CrossMark

Cite this paper

Deksne, D. (2019). Bidirectional LSTM Tagger for Latvian Grammatical Error Detection. In: Ekštein, K. (eds) Text, Speech, and Dialogue. TSD 2019. Lecture Notes in Computer Science(), vol 11697. Springer, Cham. https://doi.org/10.1007/978-3-030-27947-9_5

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-27947-9_5

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-27946-2

  • Online ISBN: 978-3-030-27947-9

  • eBook Packages: Computer ScienceComputer Science (R0)