Automatic Text Generation in Slovak Language
Automatic text generation can significantly help to ease human effort in many every-day tasks. Recent advancements in neural networks supported further research in this area and also brought significant improvement in quality of text generation. Unfortunately, most of the research deals with English language and possibilities of text generation of Slavic languages was not fully explored yet. Our work is concerned with automatic text generation and language modeling for Slovak language. Since Slovak language has more complicated grammatical structure and morphology, the task of text generation is also more challenging. We experimented with the neural approaches in natural language generation and performed several experiments with text generation in both Slovak and English language for two different domains. Additionally, we performed an experiment with human annotators to assess the quality of generated texts. Our experiments showed promising results and we can consider using neural networks for text generation as sufficient also for text generation in Slovak language.
KeywordsNatural language processing Language modeling Text generation
This work was partially supported by the Slovak Research and Development Agency under the contract No. APVV-17-0267 and No. APVV SK-IL-RD-18-0004 and the Scientific Grant Agency of the Slovak Republic, grant No. VG 1/0667/18 and grant No. VG 1/0725/19 and the education and research development project “STU as a digital leader”, project no. 002STU-2-1/2018 by the Ministry of Education, Science, Research and Sport of the Slovak Republic and by the student grant provided by Softec Pro Society.
- 2.Brants, T., Popat, A.C., Xu, P., Och, F.J., Dean, J.: Large language models in machine translation. In: Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL) (2007). http://aclweb.org/anthology/D07-1090
- 3.Chung, J., Gulcehre, C., Cho, K., Bengio, Y.: Empirical evaluation of gated recurrent neural networks on sequence modeling. In: NIPS 2014 Workshop on Deep Learning, December 2014Google Scholar
- 4.Cotterell, R., Mielke, S.J., Eisner, J., Roark, B.: Are all languages equally hard to language-model? In: Proceedings of the 2018 Conference of the NAACL: Human Language Technologies, Volume 2 (Short Papers), pp. 536–541. ACL, New Orleans, June 2018. https://doi.org/10.18653/v1/N18-2085
- 5.Galuščáková, P., Garabík, R., Bojar, O.: English-Slovak parallel corpus (2012). http://hdl.handle.net/11858/00-097C-0000-0006-AAE0-A
- 7.Józefowicz, R., Vinyals, O., Schuster, M., Shazeer, N., Wu, Y.: Exploring the limits of language modeling (2016). http://arxiv.org/abs/1602.02410
- 8.Mikolov, T., Karafiát, M., Burget, L., Černockỳ, J., Khudanpur, S.: Recurrent neural network based language model. In: Eleventh Annual Conference of the International Speech Communication Association (2010)Google Scholar
- 9.Pecar, S.: Towards opinion summarization of customer reviews. In: Proceedings of ACL 2018, Student Research Workshop, pp. 1–8. ACL, Melbourne, July 2018. https://doi.org/10.18653/v1/P18-3001
- 10.Peters, M., et al.: Deep contextualized word representations. In: Proceedings of the 2018 Conference of the NAACL: Human Language Technologies, Volume 1 (Long Papers), pp. 2227–2237. ACL, New Orleans, June 2018. https://doi.org/10.18653/v1/N18-1202
- 13.Salton, G., Ross, R., Kelleher, J.: Attentive language models. In: Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pp. 441–450. Asian Federation of Natural Language Processing, Taipei, November 2017. https://www.aclweb.org/anthology/I17-1045
- 14.Simko, J., Hanakova, M., Racsko, P., Tomlein, M., Moro, R., Bielikova, M.: Fake news reading on social media: an eye-tracking study. In: Proceedings of the 30th ACM Conference on Hypertext and Social Media, pp. 221–230. ACM (2019)Google Scholar
- 15.Sundermeyer, M., Schlüter, R., Ney, H.: LSTM neural networks for language modeling. In: 13th Annual Conference of the International Speech Communication Association (2012)Google Scholar
- 16.Yang, Z., Dai, Z., Salakhutdinov, R., Cohen, W.W.: Breaking the softmax bottleneck: a high-rank RNN language model. CoRR abs/1711.03953 (2017). http://arxiv.org/abs/1711.03953