Empowering Multilingual Abstractive Text Summarization: A Comparative Study of Word Embedding Techniques

Rane, Neha; Govilkar, Sharvari S.

doi:10.1007/978-981-99-8438-1_11

Neha Rane⁸ &
Sharvari S. Govilkar⁸

Part of the book series: Algorithms for Intelligent Systems ((AIS))

Included in the following conference series:

International Conference on Engineering, Applied Sciences and System Modeling

53 Accesses

Abstract

Multilingual Abstractive Text Summarization is one of the many critical Natural Language Processing tasks that involves generating concise and coherent summaries in multiple languages. The use of word embeddings has emerged as a pivotal technique in this domain, as it enables the representation of words as continuous vectors, capturing semantic relationships and contextual information. This paper presents an in-depth exploration of the role of word embeddings in enhancing Multilingual Abstractive Text Summarization, specifically focusing on Hindi and Marathi languages. This study investigates various word embedding techniques, including ELMo, BERT, and XLNet, and their impact on the quality of abstractive summarization for Hindi and Marathi texts by conducting comprehensive experiments using deep learning-based summarization models trained on diverse datasets in the target languages. Through rigorous evaluation, the performance of each word embedding technique using metrics such as ROUGE (Recall-Oriented Understudy for Gisting Evaluation) was analyzed. Findings of this paper reveal that the choice of word embeddings significantly influences the summarization quality for both Hindi and Marathi languages. Certain embeddings excel in capturing the linguistic nuances and semantic representations specific to each language, resulting in more coherent and informative summaries. Furthermore, the efficacy of multilingual word embeddings in cross-lingual summarization is explored, demonstrating promising results in preserving semantic relationships across Hindi and Marathi, leading to improved summarization outcomes.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 219.00; Price excludes VAT (USA)

Hardcover Book: USD 279.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Wan-Ting Hsu, Chieh-Kai Lin, Ming-Ying Lee, Kerui Min, Jing Tang, Min Sun (2018) A unified model for extractive and abstractive summarization using inconsistency loss. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp 132–141, Melbourne, Australia. Association for Computational Linguistics
Google Scholar
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, Illia Polosukhin (2017) Attention is all you need. In Proceedings of the 31st International Conference on Neural Information Processing Systems (NIPS 2017), pp 6000–6010, Long Beach, California, USA
Google Scholar
Anoop Kunchukuttan, Pratik Mehta, Pushpak Bhattacharyya (2018) The IIT Bombay English-Hindi Parallel Corpus. Language Resources and Evaluation Conference
Google Scholar
Scialom T, Dray P-A, Lamprier S, Piwowarski B, Staiano J (2020) MLSUM: the multilingual summarization corpus. Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)
Google Scholar
Hasan T, Bhattacharjee A, Islam MS, Mubasshir K, Li Y-F, Kang Y-B, Rahman MS, Shahriyar R (2021) XL-sum: large-scale multilingual abstractive summarization for 44 languages. In Proc. Findings Assoc. Comput. Linguistics (ACL-IJCNLP), pp 4693–4703
Google Scholar
Divyanshu Aggarwal, Vivek Gupta, Anoop Kunchukuttan (2022) Indicxnli: evaluating multilingual inference for Indian languages
Google Scholar
Sebastian Ruder, Matthew E Peters, Swabha Swayamdipta, Thomas Wolf (2019) Transfer learning in natural language processing. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Tutorials, pp 15–18, Minneapolis, Minnesota. Association for Computational Linguistics
Google Scholar
Yu C, Wang J, Chen Y, Huang M (2019) Transfer learning with dynamic adversarial adaptation network. In Proc. 19th IEEE International Conference on Data Mining, Beijing, pp 1–9
Google Scholar
Zhuang F, Zhou Y, Zhang F, Ao X, Xie X, He Q (2017) Sequential transfer learning: cross-domain novelty seeking trait mining for recommendation. In Proc. 26th International Conference on World Wide Web Companion, Perth, pp 881–882
Google Scholar
Matthew E Peters, Mark Neumann, Mohit Iyyer, Matt Gardner, Christopher Clark, Kenton Lee, Luke Zettlemoyer (2018) Deep contextualized word representations. In NAACL-HLT, pp 2227—2237
Google Scholar
Devlin Jacob, Chang Ming-Wei, Lee Kenton, Toutanova Kristina (2018) BERT: Pre-training of deep bidirectional transformers for language understanding. arXiv:1810.04805. Retrieved from https://arxiv.org/abs/1810.04805
Abigail See, Peter J Liu, Christopher D Manning (2017) Get to the point: Summarization with pointer-generator networks. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1073–1083, Vancouver, Canada. Association for Computational Linguistics
Google Scholar
Zhang H, Cai J, Xu J, Wang J (2019) Pretraining-based natural language generation for text summarization, pp 789–797. https://doi.org/10.18653/v1/K19-1074
Pranav Nair, Anil Kumar Singh (2021) On reducing repetition in abstractive summarization. In Proceedings of the Student Research Workshop Associated with RANLP, pp 126–134
Google Scholar
Tomas Mikolov, Kai Chen, Greg Corrado, Jeffrey Dean (2013) Efficient estimation of word representations in vector space. arXiv:1301.3781
Hasan Tahmid, Abhik Bhattacharjee, Md Saiful Islam, Kazi Samin, Yuan-Fang Li, Yong-Bin Kang M Sohel Rahman, Rifat Shahriyar (2021) XL-sum: Large-scale multilingual abstractive summarization for 44 languages.“ arXiv preprint arXiv:2106.13822
Ladhak Faisal, Esin Durmus, Claire Cardie, Kathleen McKeown (2020) WikiLingua: a new benchmark dataset for cross-lingual abstractive summarization.“ arXiv preprint arXiv:2010.03093
Brochier R, Guille A, Velcin J (2019) Global vectors for node representations. Proc. World Wide Web Conf. (WWW), pp 2587–2593.
Google Scholar
Aditya Grover , Jure Leskovec (2016) Node2vec: Scalable feature learning for networks. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
Google Scholar
Celia Hacker, Bastian Rieck (2022) On the surprising behaviour of node2vec. arXiv preprint arXiv:2206.08252
Bojanowski P, Grave E, Joulin A, Mikolov T (2017) Enriching word vectors with subword information. Trans Assoc Comput Linguistics 5:135–146
Google Scholar
Divyanshu Kakwani, Anoop Kunchukuttan, Satish Golla, Gokul NC., Avik Bhattacharyya, Mitesh M Khapra, Pratyush Kumar (2020) IndicNLPSuite: Monolingual corpora, evaluation benchmarks and pre-trained multilingual language models for Indian languages. In Findings of the Association for Computational Linguistics: EMNLP 2020, pp 4948–4961, Online. Association for Computational Linguistics
Google Scholar

Download references

Author information

Authors and Affiliations

Pillai College of Engineering, Mumbai, India
Neha Rane & Sharvari S. Govilkar

Authors

Neha Rane
View author publications
You can also search for this author in PubMed Google Scholar
Sharvari S. Govilkar
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Neha Rane .

Editor information

Editors and Affiliations

Faculty of Innovation and Technology, Taylor’s University, Subang Jaya, Selangor, Malaysia
David Asirvatham
University of Southeast Norway, Notodden, Norway
Francisco M. Gonzalez-Longatt
Gdansk University of Technology, Gdańsk, Poland
Przemyslaw Falkowski-Gilski
Professor of Computer Engineering, Papua New Guinea University of Technology, Lae, Papua New Guinea
R. Kanthavel

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Rane, N., Govilkar, S.S. (2024). Empowering Multilingual Abstractive Text Summarization: A Comparative Study of Word Embedding Techniques. In: Asirvatham, D., Gonzalez-Longatt, F.M., Falkowski-Gilski, P., Kanthavel, R. (eds) Evolutionary Artificial Intelligence. ICEASSM 2017. Algorithms for Intelligent Systems. Springer, Singapore. https://doi.org/10.1007/978-981-99-8438-1_11

Download citation

DOI: https://doi.org/10.1007/978-981-99-8438-1_11
Published: 14 March 2024
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-8437-4
Online ISBN: 978-981-99-8438-1
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics