Skip to main content

Empowering Multilingual Abstractive Text Summarization: A Comparative Study of Word Embedding Techniques

  • Conference paper
  • First Online:
Evolutionary Artificial Intelligence (ICEASSM 2017)

Part of the book series: Algorithms for Intelligent Systems ((AIS))

  • 53 Accesses

Abstract

Multilingual Abstractive Text Summarization is one of the many critical Natural Language Processing tasks that involves generating concise and coherent summaries in multiple languages. The use of word embeddings has emerged as a pivotal technique in this domain, as it enables the representation of words as continuous vectors, capturing semantic relationships and contextual information. This paper presents an in-depth exploration of the role of word embeddings in enhancing Multilingual Abstractive Text Summarization, specifically focusing on Hindi and Marathi languages. This study investigates various word embedding techniques, including ELMo, BERT, and XLNet, and their impact on the quality of abstractive summarization for Hindi and Marathi texts by conducting comprehensive experiments using deep learning-based summarization models trained on diverse datasets in the target languages. Through rigorous evaluation, the performance of each word embedding technique using metrics such as ROUGE (Recall-Oriented Understudy for Gisting Evaluation) was analyzed. Findings of this paper reveal that the choice of word embeddings significantly influences the summarization quality for both Hindi and Marathi languages. Certain embeddings excel in capturing the linguistic nuances and semantic representations specific to each language, resulting in more coherent and informative summaries. Furthermore, the efficacy of multilingual word embeddings in cross-lingual summarization is explored, demonstrating promising results in preserving semantic relationships across Hindi and Marathi, leading to improved summarization outcomes.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 219.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 279.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Wan-Ting Hsu, Chieh-Kai Lin, Ming-Ying Lee, Kerui Min, Jing Tang, Min Sun (2018) A unified model for extractive and abstractive summarization using inconsistency loss. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp 132–141, Melbourne, Australia. Association for Computational Linguistics

    Google Scholar 

  2. Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, Illia Polosukhin (2017) Attention is all you need. In Proceedings of the 31st International Conference on Neural Information Processing Systems (NIPS 2017), pp 6000–6010, Long Beach, California, USA

    Google Scholar 

  3. Anoop Kunchukuttan, Pratik Mehta, Pushpak Bhattacharyya (2018) The IIT Bombay English-Hindi Parallel Corpus. Language Resources and Evaluation Conference

    Google Scholar 

  4. Scialom T, Dray P-A, Lamprier S, Piwowarski B, Staiano J (2020) MLSUM: the multilingual summarization corpus. Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

    Google Scholar 

  5. Hasan T, Bhattacharjee A, Islam MS, Mubasshir K, Li Y-F, Kang Y-B, Rahman MS, Shahriyar R (2021) XL-sum: large-scale multilingual abstractive summarization for 44 languages. In Proc. Findings Assoc. Comput. Linguistics (ACL-IJCNLP), pp 4693–4703

    Google Scholar 

  6. Divyanshu Aggarwal, Vivek Gupta, Anoop Kunchukuttan (2022) Indicxnli: evaluating multilingual inference for Indian languages

    Google Scholar 

  7. Sebastian Ruder, Matthew E Peters, Swabha Swayamdipta, Thomas Wolf (2019) Transfer learning in natural language processing. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Tutorials, pp 15–18, Minneapolis, Minnesota. Association for Computational Linguistics

    Google Scholar 

  8. Yu C, Wang J, Chen Y, Huang M (2019) Transfer learning with dynamic adversarial adaptation network. In Proc. 19th IEEE International Conference on Data Mining, Beijing, pp 1–9

    Google Scholar 

  9. Zhuang F, Zhou Y, Zhang F, Ao X, Xie X, He Q (2017) Sequential transfer learning: cross-domain novelty seeking trait mining for recommendation. In Proc. 26th International Conference on World Wide Web Companion, Perth, pp 881–882

    Google Scholar 

  10. Matthew E Peters, Mark Neumann, Mohit Iyyer, Matt Gardner, Christopher Clark, Kenton Lee, Luke Zettlemoyer (2018) Deep contextualized word representations. In NAACL-HLT, pp 2227—2237

    Google Scholar 

  11. Devlin Jacob, Chang Ming-Wei, Lee Kenton, Toutanova Kristina (2018) BERT: Pre-training of deep bidirectional transformers for language understanding. arXiv:1810.04805. Retrieved from https://arxiv.org/abs/1810.04805

  12. Abigail See, Peter J Liu, Christopher D Manning (2017) Get to the point: Summarization with pointer-generator networks. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1073–1083, Vancouver, Canada. Association for Computational Linguistics

    Google Scholar 

  13. Zhang H, Cai J, Xu J, Wang J (2019) Pretraining-based natural language generation for text summarization, pp 789–797. https://doi.org/10.18653/v1/K19-1074

  14. Pranav Nair, Anil Kumar Singh (2021) On reducing repetition in abstractive summarization. In Proceedings of the Student Research Workshop Associated with RANLP, pp 126–134

    Google Scholar 

  15. Tomas Mikolov, Kai Chen, Greg Corrado, Jeffrey Dean (2013) Efficient estimation of word representations in vector space. arXiv:1301.3781

  16. Hasan Tahmid, Abhik Bhattacharjee, Md Saiful Islam, Kazi Samin, Yuan-Fang Li, Yong-Bin Kang M Sohel Rahman, Rifat Shahriyar (2021) XL-sum: Large-scale multilingual abstractive summarization for 44 languages.“ arXiv preprint arXiv:2106.13822

  17. Ladhak Faisal, Esin Durmus, Claire Cardie, Kathleen McKeown (2020) WikiLingua: a new benchmark dataset for cross-lingual abstractive summarization.“ arXiv preprint arXiv:2010.03093

  18. Brochier R, Guille A, Velcin J (2019) Global vectors for node representations. Proc. World Wide Web Conf. (WWW), pp 2587–2593.

    Google Scholar 

  19. Aditya Grover , Jure Leskovec (2016) Node2vec: Scalable feature learning for networks. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

    Google Scholar 

  20. Celia Hacker, Bastian Rieck (2022) On the surprising behaviour of node2vec. arXiv preprint arXiv:2206.08252

  21. Bojanowski P, Grave E, Joulin A, Mikolov T (2017) Enriching word vectors with subword information. Trans Assoc Comput Linguistics 5:135–146

    Google Scholar 

  22. Divyanshu Kakwani, Anoop Kunchukuttan, Satish Golla, Gokul NC., Avik Bhattacharyya, Mitesh M Khapra, Pratyush Kumar (2020) IndicNLPSuite: Monolingual corpora, evaluation benchmarks and pre-trained multilingual language models for Indian languages. In Findings of the Association for Computational Linguistics: EMNLP 2020, pp 4948–4961, Online. Association for Computational Linguistics

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Neha Rane .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Rane, N., Govilkar, S.S. (2024). Empowering Multilingual Abstractive Text Summarization: A Comparative Study of Word Embedding Techniques. In: Asirvatham, D., Gonzalez-Longatt, F.M., Falkowski-Gilski, P., Kanthavel, R. (eds) Evolutionary Artificial Intelligence. ICEASSM 2017. Algorithms for Intelligent Systems. Springer, Singapore. https://doi.org/10.1007/978-981-99-8438-1_11

Download citation

Publish with us

Policies and ethics