Advertisement

Automatic Text Generation in Slovak Language

  • Dominik Vasko
  • Samuel PecarEmail author
  • Marian Simko
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 12011)

Abstract

Automatic text generation can significantly help to ease human effort in many every-day tasks. Recent advancements in neural networks supported further research in this area and also brought significant improvement in quality of text generation. Unfortunately, most of the research deals with English language and possibilities of text generation of Slavic languages was not fully explored yet. Our work is concerned with automatic text generation and language modeling for Slovak language. Since Slovak language has more complicated grammatical structure and morphology, the task of text generation is also more challenging. We experimented with the neural approaches in natural language generation and performed several experiments with text generation in both Slovak and English language for two different domains. Additionally, we performed an experiment with human annotators to assess the quality of generated texts. Our experiments showed promising results and we can consider using neural networks for text generation as sufficient also for text generation in Slovak language.

Keywords

Natural language processing Language modeling Text generation 

Notes

Acknowledgments

This work was partially supported by the Slovak Research and Development Agency under the contract No. APVV-17-0267 and No. APVV SK-IL-RD-18-0004 and the Scientific Grant Agency of the Slovak Republic, grant No. VG 1/0667/18 and grant No. VG 1/0725/19 and the education and research development project “STU as a digital leader”, project no. 002STU-2-1/2018 by the Ministry of Education, Science, Research and Sport of the Slovak Republic and by the student grant provided by Softec Pro Society.

References

  1. 1.
    Bengio, Y., Ducharme, R., Vincent, P., Jauvin, C.: A neural probabilistic language model. J. Mach. Learn. Res. 3(Feb), 1137–1155 (2003)zbMATHGoogle Scholar
  2. 2.
    Brants, T., Popat, A.C., Xu, P., Och, F.J., Dean, J.: Large language models in machine translation. In: Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL) (2007). http://aclweb.org/anthology/D07-1090
  3. 3.
    Chung, J., Gulcehre, C., Cho, K., Bengio, Y.: Empirical evaluation of gated recurrent neural networks on sequence modeling. In: NIPS 2014 Workshop on Deep Learning, December 2014Google Scholar
  4. 4.
    Cotterell, R., Mielke, S.J., Eisner, J., Roark, B.: Are all languages equally hard to language-model? In: Proceedings of the 2018 Conference of the NAACL: Human Language Technologies, Volume 2 (Short Papers), pp. 536–541. ACL, New Orleans, June 2018.  https://doi.org/10.18653/v1/N18-2085
  5. 5.
    Galuščáková, P., Garabík, R., Bojar, O.: English-Slovak parallel corpus (2012). http://hdl.handle.net/11858/00-097C-0000-0006-AAE0-A
  6. 6.
    Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997).  https://doi.org/10.1162/neco.1997.9.8.1735CrossRefGoogle Scholar
  7. 7.
    Józefowicz, R., Vinyals, O., Schuster, M., Shazeer, N., Wu, Y.: Exploring the limits of language modeling (2016). http://arxiv.org/abs/1602.02410
  8. 8.
    Mikolov, T., Karafiát, M., Burget, L., Černockỳ, J., Khudanpur, S.: Recurrent neural network based language model. In: Eleventh Annual Conference of the International Speech Communication Association (2010)Google Scholar
  9. 9.
    Pecar, S.: Towards opinion summarization of customer reviews. In: Proceedings of ACL 2018, Student Research Workshop, pp. 1–8. ACL, Melbourne, July 2018.  https://doi.org/10.18653/v1/P18-3001
  10. 10.
    Peters, M., et al.: Deep contextualized word representations. In: Proceedings of the 2018 Conference of the NAACL: Human Language Technologies, Volume 1 (Long Papers), pp. 2227–2237. ACL, New Orleans, June 2018.  https://doi.org/10.18653/v1/N18-1202
  11. 11.
    Pikuliak, M., Simko, M., Bielikova, M.: Towards combining multitask and multilingual learning. In: Catania, B., Královič, R., Nawrocki, J., Pighizzini, G. (eds.) SOFSEM 2019. LNCS, vol. 11376, pp. 435–446. Springer, Cham (2019).  https://doi.org/10.1007/978-3-030-10801-4_34CrossRefGoogle Scholar
  12. 12.
    Rush, A.M., Chopra, S., Weston, J.: A neural attention model for abstractive sentence summarization. In: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, pp. 379–389. ACL (2015).  https://doi.org/10.18653/v1/D15-1044, http://aclweb.org/anthology/D15-1044
  13. 13.
    Salton, G., Ross, R., Kelleher, J.: Attentive language models. In: Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pp. 441–450. Asian Federation of Natural Language Processing, Taipei, November 2017. https://www.aclweb.org/anthology/I17-1045
  14. 14.
    Simko, J., Hanakova, M., Racsko, P., Tomlein, M., Moro, R., Bielikova, M.: Fake news reading on social media: an eye-tracking study. In: Proceedings of the 30th ACM Conference on Hypertext and Social Media, pp. 221–230. ACM (2019)Google Scholar
  15. 15.
    Sundermeyer, M., Schlüter, R., Ney, H.: LSTM neural networks for language modeling. In: 13th Annual Conference of the International Speech Communication Association (2012)Google Scholar
  16. 16.
    Yang, Z., Dai, Z., Salakhutdinov, R., Cohen, W.W.: Breaking the softmax bottleneck: a high-rank RNN language model. CoRR abs/1711.03953 (2017). http://arxiv.org/abs/1711.03953

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  1. 1.Faculty of Informatics and Information TechnologiesSlovak University of Technology in BratislavaBratislavaSlovakia

Personalised recommendations