Skip to main content

Cross-lingual transfer of abstractive summarizer to less-resource language

Abstract

Automatic text summarization extracts important information from texts and presents the information in the form of a summary. Abstractive summarization approaches progressed significantly by switching to deep neural networks, but results are not yet satisfactory, especially for languages where large training sets do not exist. In several natural language processing tasks, a cross-lingual model transfer is successfully applied in less-resource languages. For summarization, the cross-lingual model transfer was not attempted due to a non-reusable decoder side of neural models that cannot correct target language generation. In our work, we use a pre-trained English summarization model based on deep neural networks and sequence-to-sequence architecture to summarize Slovene news articles. We address the problem of inadequate decoder by using an additional language model for the evaluation of the generated text in target language. We test several cross-lingual summarization models with different amounts of target data for fine-tuning. We assess the models with automatic evaluation measures and conduct a small-scale human evaluation. Automatic evaluation shows that the summaries of our best cross-lingual model are useful and of quality similar to the model trained only in the target language. Human evaluation shows that our best model generates summaries with high accuracy and acceptable readability. However, similar to other abstractive models, our models are not perfect and may occasionally produce misleading or absurd content.

This is a preview of subscription content, access via your institution.

Fig. 1

Availability of data and material

The data used in the study was extracted from the Slovene Gigafida corpusFootnote 4. Restrictions apply to the availability of these data. Due to that, the extracted summarization dataset is available upon an email request to the authors.

Code Availability

The source code of our summarization system is freely available at https://github.com/azagsam/cross-lingual-summarization.

Notes

  1. 1.

    https://cs.nyu.edu/~kcho/DMQA/

  2. 2.

    https://fasttext.cc/docs/en/crawl-vectors.html

  3. 3.

    https://github.com/azagsam/cross-lingual-summarization

  4. 4.

    https://www.cjvt.si/en/research/cjvt-projects/gigafida-corpus/

References

  1. Adams, O., Makarucha, A., Neubig, G., Bird, S., & Cohn, T. (2017). Cross-lingual word embeddings for low-resource language modeling. In Proceedings of the 15th conference of the european chapter of the ACL: Volume 1, Long Papers, pp. 937–947.

  2. Aksenov, D., Schneider, J.M., Bourgonje, P., Schwarzenberg, R., Hennig, L., & Rehm, G. (2020). Abstractive text summarization based on language model conditioning and locality modeling. In Proceedings of The 12th Language resources and evaluation conference, pp. 6680–6689.

  3. Artetxe, M., & Schwenk, H. (2019). Massively multilingual sentence embeddings for zero-shot cross-lingual transfer and beyond. Transactions of the Association for Computational Linguistics, 7, 597–610.

    Article  Google Scholar 

  4. Baevski, A., & Auli, M. (2018). Adaptive input representations for neural language modeling. In International conference on learning representations. ICLR.

  5. Bahdanau, D., Cho, K., & Bengio, Y. (2015). Neural machine translation by jointly learning to align and translate. In International conference on learning representations, ICLR.

  6. Bois, R., Leveling, J., Goeuriot, L., Jones, G.J., & Kelly, L. (2014). Porting a summarizer to the French language. In Proceedings of TALN 2014 (Volume 2: Short Papers), pp 550–555.

  7. Bojanowski, P., Grave, É., Joulin, A., & Mikolov, T. (2017). Enriching word vectors with subword information. Transactions of the Association for Computational Linguistics, 5, 135–146.

    Article  Google Scholar 

  8. Chelba, C., Mikolov, T., Schuster, M., Ge, Q., Brants, T., Koehn, P., & Robinson, T. (2014). One billion word benchmark for measuring progress in statistical language modeling. In Fifteenth annual conference of the international speech communication association.

  9. Chen, Y.C., & Bansal, M. (2018). Fast abstractive summarization with reinforce-selected sentence rewriting. In Proceedings of the 56th annual meeting of the association for computational linguistics: Volume 1 Long Papers, pp. 675–686.

  10. Chi, Z., Dong, L., Wei, F., Wang, W., Mao, X.L., & Huang, H. (2020). Cross-lingual natural language generation via pre-training. In Proceedings of the AAAI conference on artificial intelligence.

  11. Cohan, A., Dernoncourt, F., Kim, D.S., Bui, T., Kim, S., Chang, W., & Goharian, N. (2018). A discourse-aware attention model for abstractive summarization of long documents. In Proceedings of the 2018 conference of the north american chapter of the association for computational linguistics: human language technologies, Volume 2 (Short Papers), pp. 615–621.

  12. Dai, Z., Yang, Z., Yang, Y., Carbonell, J.G., Le, Q., & Salakhutdinov, R. (2019). Transformer-XL: Attentive language models beyond a fixed-length context. In Proceedings of the 57th annual meeting of the association for computational linguistics, pp. 2978–2988.

  13. Devlin, J., Chang, M.W., Lee, K., & Toutanova, K. (2019). BERT: Pre-training Of deep bidirectional transformers for language understanding. In Proceedings of the 2019 conference of the north american chapter of the association for computational linguistics: human language technologies, technologies, Volume 1 (Long and Short Papers), pp. 4171–4186.

  14. Dou, Z.Y., Liu, P., Hayashi, H., Jiang, Z., & Neubig, G. (2020). GSum: A general framework for guided neural abstractive summarization. arXiv:201008014.

  15. Fecht, P., Blank, S., & Zorn, H.P. (2019). Sequential transfer learning in NLP for German text summarization. In Proceedings of the 4th edition of the swiss text analytics conference.

  16. Gambhir, M., & Gupta, V. (2017). Recent automatic text summarization techniques: a survey. Artificial Intelligence Review, 47(1), 1–66.

    Article  Google Scholar 

  17. Graff, D., Kong, J., Chen, K., & Maeda, K. (2003). English gigaword. Linguistic data consortium Philadelphia, 4(1), 34.

    Google Scholar 

  18. Grave, E., Bojanowski, P., Gupta, P., Joulin, A., & Mikolov, T. (2018). Learning word vectors for 157 languages. In Language resources and evaluation conference.

  19. Grusky, M., Naaman, M., & Artzi, Y. (2018). Newsroom: a dataset of 1.3 million summaries with diverse extractive strategies. In Proceedings of the 2018 conference of the north american chapter of the association for computational linguistics: human language technologies, Volume 1 (Long Papers), pp. 708–719.

  20. Hu, B., Chen, Q., & Zhu, F. (2015). LCSTS: A large scale Chinese short text summarization dataset. In Proceedings of the 2015 conference on empirical methods in natural language processing, pp. 1967–1972.

  21. Krek, S., Arhar-Holdt, Š, Erjavec, T., Čibej, J, Repar, A., Gantar, P., Ljubešić, N, Kosem, I., & Dobrovoljc, K. (2020). Gigafida 2.0: The reference corpus of written standard Slovene. In Proceedings of the 12th language resources and evaluation conference, pp. 3340–3345.

  22. Kryściński, W., Rajani, N., Agarwal, D., Xiong, C., & Radev, D. (2021). Booksum: A collection of datasets for long-form narrative summarization. arXiv:2105.08209.

  23. Lample, G., Conneau, A., Ranzato, M., Denoyer, L., & Jégou, H (2018). Word translation without parallel data. In International conference on learning representations, ICLR.

  24. Li, L., Forăscu, C., El-Haj, M., & Giannakopoulos, G. (2013). Multi-document multilingual summarization corpus preparation, part 1: Arabic, English, Greek, Chinese, Romanian. In Proceedings of the multiling 2013 workshop on multilingual multi-document summarization, pp. 1–12.

  25. Lin, C.Y., & Hovy, E. (2002). Manual and automatic evaluation of summaries. In Proceedings of the ACL-02 workshop on automatic summarization, (Vol. 4 pp. 45–51).

  26. Martinc, M., Pollak, S., & Robnik-Šikonja, M. (2021). Supervised and unsupervised neural approaches to text readability. Computational Linguistics, 47(1), pp. 141–179.

  27. Merrouni, Z.A., Frikh, B., & Ouhbi, B. (2019). Automatic keyphrase extraction: a survey and trends. Journal of Intelligent Information Systems, 54, 391–424.

    Article  Google Scholar 

  28. Mihalcea, R. (2004). Graph-based ranking algorithms for sentence extraction applied to text summarization. In Proceedings of the ACL interactive poster and demonstration sessions, pp. 170–173.

  29. Mikolov, T., Chen, K., Corrado, G., & Dean, J. (2013a). Efficient estimation of word representations in vector space. arXiv:13013781.

  30. Mikolov, T., Le, Q.V., & Sutskever, I. (2013b). Exploiting similarities among languages for machine translation. arXiv:13094168.

  31. Nallapati, R., Zhou, B., dos, Santos C, Gulcehre, C., & Xiang, B. (2016). Abstractive text summarization using sequence-to-sequence RNNs and beyond. In Proceedings of The 20th SIGNLL conference on computational natural language learning, pp. 280–290.

  32. Novikova, J., Dušek, O., Curry, A.C., & Rieser, V. (2017). Why we need new evaluation metrics for NLG. In Proceedings of the 2017 conference on empirical methods in natural language processing, pp. 2241–2252.

  33. Ouyang, J., Song, B., & McKeown, K. (2019). A robust abstractive system for cross-lingual summarization. In Proceedings of the 2019 conference of the north american chapter of the association for computational linguistics: human language technologies, Volume 1 (Long and Short Papers), pp. 2025–2031.

  34. Over, P., Dang, H., & Harman, D. (2007). DUC in context. Information Processing & Management, 43(6), 1506–1520.

    Article  Google Scholar 

  35. Pennington, J., Socher, R., & Manning, C. (2014). Glove: Global vectors for word representation. In Proceedings of the 2014 conference on empirical methods in natural language processing, EMNLP, pp. 1532–1543.

  36. Peters, M.E., Neumann, M., Iyyer, M., Gardner, M., Clark, C., Lee, K., & Zettlemoyer, L. (2018). Deep contextualized word representations. In Proceedings of NAACL-HLT, pp 2227–2237.

  37. Qi, W., Yan, Y., Gong, Y., Liu, D., Duan, N., Chen, J., Zhang, R., & Zhou, M. (2020). Prophetnet: Predicting future n-gram for sequence-to-sequence pre-training. In Proceedings of the 2020 conference on empirical methods in natural language processing: findings, pp. 2401–2410.

  38. Ruder, S., Vulić, I., & Søgaard, A. (2019). A survey of cross-lingual word embedding models. Journal of Artificial Intelligence Research, 65, 569–631.

    MathSciNet  Article  Google Scholar 

  39. Rush, A.M., Chopra, S., & Weston, J. (2015). A neural attention model for abstractive sentence summarization. In Proceedings of the 2015 conference on empirical methods in natural language processing, pp. 379–389.

  40. Scialom, T., Dray, P.A., Lamprier, S., Piwowarski, B., & Staiano, J. (2020). MLSUM: The multilingual summarization corpus. In Proceedings of the 2020 conference on empirical methods in natural language processing, EMNLP, pp. 8051–8067.

  41. See, A., Liu, P.J., & Manning, C.D. (2017). Get to the point: Summarization with pointer-generator networks. In Proceedings of the 55th annual meeting of the association for computational linguistics (Volume 1: Long Papers), pp. 1073–1083.

  42. Straka, M., Mediankin, N., Kocmi, T., žabokrtskỳ, Z., Hudeček, V., & Hajic, J. (2018). Sumeczech: Large Czech news-based summarization dataset. In Proceedings of the eleventh international conference on language resources and evaluation, LREC.

  43. Suppa, M., & Adamec, J. (2020). A summarization dataset of Slovak news articles. In Proceedings of the 12th language resources and evaluation conference, pp. 6725–6730.

  44. Tu, Z., Lu, Z., Liu, Y., Liu, X., & Li, H. (2016). Modeling coverage for neural machine translation. In Proceedings of the 54th annual meeting of the association for computational linguistics (Volume 1: Long Papers), pp. 76–85.

  45. Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., & Polosukhin, I. (2017). Attention is all you need. In Proceedings of the 31st international conference on neural information processing systems, pp. 6000–6010.

  46. Vaswani, A., Bengio, S., Brevdo, E., Chollet, F., Gomez, A., Gouws, S., Jones, L., Kaiser, Ł, Kalchbrenner, N., Parmar, N., & et, al. (2018). Tensor2Tensor for neural machine translation. In Proceedings of the 13th conference of the association for machine translation in the Americas (Volume 1: Research Track), pp. 193–199.

  47. Vinyals, O., Fortunato, M., & Jaitly, N. (2015). Pointer networks. In Proceedings of the 28th international conference on neural information processing systems, (Vol. 2 pp. 2692–2700).

  48. Zhang, J., Zhao, Y., Saleh, M., & Liu, P. (2020). Pegasus: Pre-training with extracted gap-sentences for abstractive summarization. In International conference on machine learning, PMLR, pp 11328–11339.

  49. Zhang, T., Kishore, V., Wu, F., Weinberger, K.Q., & Artzi, Y. (2019). BERTScore: Evaluating text generation with BERT. arXiv:190409675.

  50. Zhu, J., Wang, Q., Wang, Y., Zhou, Y., Zhang, J., Wang, S., & Zong, C. (2019). NCLS: Neural cross-lingual summarization. In Proceedings of the 2019 conference on empirical methods in natural language processing and the 9th international joint conference on natural language processing (EMNLP-IJCNLP), pp. 3045–3055.

  51. Zidarn, R. (2020). Automatic text summarization of Slovene texts using deep neural networks. In University of Ljubljana faculty of computer and information science, Ljubljana, (MSc thesis, in Slovene).

Download references

Acknowledgements

The research was supported by the Slovene Research Agency through research core funding no. P6-0411 and project no. J6-2581. The research was financially supported by European social fund and Republic of Slovenia, Ministry of Education, Science and Sport through projects Quality of Slovene textbooks (KaUč) and Ministry of Culture of Republic of Slovenia through project Development of Slovene in Digital Environment (RSDO). This paper is supported by European Union’s Horizon 2020 Programme project EMBEDDIA (Cross-Lingual Embeddings for Less-Represented Languages in European News Media, grant no. 825153).

Author information

Affiliations

Authors

Corresponding author

Correspondence to Aleš Žagar.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Žagar, A., Robnik-Šikonja, M. Cross-lingual transfer of abstractive summarizer to less-resource language. J Intell Inf Syst (2021). https://doi.org/10.1007/s10844-021-00663-8

Download citation

Keywords

  • Automatic summarization
  • Text generation
  • Deep neural networks
  • Language models
  • Cross-lingual embeddings
  • Abstractive summarization