Abstract
Multilingual Abstractive Text Summarization is one of the many critical Natural Language Processing tasks that involves generating concise and coherent summaries in multiple languages. The use of word embeddings has emerged as a pivotal technique in this domain, as it enables the representation of words as continuous vectors, capturing semantic relationships and contextual information. This paper presents an in-depth exploration of the role of word embeddings in enhancing Multilingual Abstractive Text Summarization, specifically focusing on Hindi and Marathi languages. This study investigates various word embedding techniques, including ELMo, BERT, and XLNet, and their impact on the quality of abstractive summarization for Hindi and Marathi texts by conducting comprehensive experiments using deep learning-based summarization models trained on diverse datasets in the target languages. Through rigorous evaluation, the performance of each word embedding technique using metrics such as ROUGE (Recall-Oriented Understudy for Gisting Evaluation) was analyzed. Findings of this paper reveal that the choice of word embeddings significantly influences the summarization quality for both Hindi and Marathi languages. Certain embeddings excel in capturing the linguistic nuances and semantic representations specific to each language, resulting in more coherent and informative summaries. Furthermore, the efficacy of multilingual word embeddings in cross-lingual summarization is explored, demonstrating promising results in preserving semantic relationships across Hindi and Marathi, leading to improved summarization outcomes.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Wan-Ting Hsu, Chieh-Kai Lin, Ming-Ying Lee, Kerui Min, Jing Tang, Min Sun (2018) A unified model for extractive and abstractive summarization using inconsistency loss. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp 132–141, Melbourne, Australia. Association for Computational Linguistics
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, Illia Polosukhin (2017) Attention is all you need. In Proceedings of the 31st International Conference on Neural Information Processing Systems (NIPS 2017), pp 6000–6010, Long Beach, California, USA
Anoop Kunchukuttan, Pratik Mehta, Pushpak Bhattacharyya (2018) The IIT Bombay English-Hindi Parallel Corpus. Language Resources and Evaluation Conference
Scialom T, Dray P-A, Lamprier S, Piwowarski B, Staiano J (2020) MLSUM: the multilingual summarization corpus. Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)
Hasan T, Bhattacharjee A, Islam MS, Mubasshir K, Li Y-F, Kang Y-B, Rahman MS, Shahriyar R (2021) XL-sum: large-scale multilingual abstractive summarization for 44 languages. In Proc. Findings Assoc. Comput. Linguistics (ACL-IJCNLP), pp 4693–4703
Divyanshu Aggarwal, Vivek Gupta, Anoop Kunchukuttan (2022) Indicxnli: evaluating multilingual inference for Indian languages
Sebastian Ruder, Matthew E Peters, Swabha Swayamdipta, Thomas Wolf (2019) Transfer learning in natural language processing. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Tutorials, pp 15–18, Minneapolis, Minnesota. Association for Computational Linguistics
Yu C, Wang J, Chen Y, Huang M (2019) Transfer learning with dynamic adversarial adaptation network. In Proc. 19th IEEE International Conference on Data Mining, Beijing, pp 1–9
Zhuang F, Zhou Y, Zhang F, Ao X, Xie X, He Q (2017) Sequential transfer learning: cross-domain novelty seeking trait mining for recommendation. In Proc. 26th International Conference on World Wide Web Companion, Perth, pp 881–882
Matthew E Peters, Mark Neumann, Mohit Iyyer, Matt Gardner, Christopher Clark, Kenton Lee, Luke Zettlemoyer (2018) Deep contextualized word representations. In NAACL-HLT, pp 2227—2237
Devlin Jacob, Chang Ming-Wei, Lee Kenton, Toutanova Kristina (2018) BERT: Pre-training of deep bidirectional transformers for language understanding. arXiv:1810.04805. Retrieved from https://arxiv.org/abs/1810.04805
Abigail See, Peter J Liu, Christopher D Manning (2017) Get to the point: Summarization with pointer-generator networks. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1073–1083, Vancouver, Canada. Association for Computational Linguistics
Zhang H, Cai J, Xu J, Wang J (2019) Pretraining-based natural language generation for text summarization, pp 789–797. https://doi.org/10.18653/v1/K19-1074
Pranav Nair, Anil Kumar Singh (2021) On reducing repetition in abstractive summarization. In Proceedings of the Student Research Workshop Associated with RANLP, pp 126–134
Tomas Mikolov, Kai Chen, Greg Corrado, Jeffrey Dean (2013) Efficient estimation of word representations in vector space. arXiv:1301.3781
Hasan Tahmid, Abhik Bhattacharjee, Md Saiful Islam, Kazi Samin, Yuan-Fang Li, Yong-Bin Kang M Sohel Rahman, Rifat Shahriyar (2021) XL-sum: Large-scale multilingual abstractive summarization for 44 languages.“ arXiv preprint arXiv:2106.13822
Ladhak Faisal, Esin Durmus, Claire Cardie, Kathleen McKeown (2020) WikiLingua: a new benchmark dataset for cross-lingual abstractive summarization.“ arXiv preprint arXiv:2010.03093
Brochier R, Guille A, Velcin J (2019) Global vectors for node representations. Proc. World Wide Web Conf. (WWW), pp 2587–2593.
Aditya Grover , Jure Leskovec (2016) Node2vec: Scalable feature learning for networks. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
Celia Hacker, Bastian Rieck (2022) On the surprising behaviour of node2vec. arXiv preprint arXiv:2206.08252
Bojanowski P, Grave E, Joulin A, Mikolov T (2017) Enriching word vectors with subword information. Trans Assoc Comput Linguistics 5:135–146
Divyanshu Kakwani, Anoop Kunchukuttan, Satish Golla, Gokul NC., Avik Bhattacharyya, Mitesh M Khapra, Pratyush Kumar (2020) IndicNLPSuite: Monolingual corpora, evaluation benchmarks and pre-trained multilingual language models for Indian languages. In Findings of the Association for Computational Linguistics: EMNLP 2020, pp 4948–4961, Online. Association for Computational Linguistics
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Rane, N., Govilkar, S.S. (2024). Empowering Multilingual Abstractive Text Summarization: A Comparative Study of Word Embedding Techniques. In: Asirvatham, D., Gonzalez-Longatt, F.M., Falkowski-Gilski, P., Kanthavel, R. (eds) Evolutionary Artificial Intelligence. ICEASSM 2017. Algorithms for Intelligent Systems. Springer, Singapore. https://doi.org/10.1007/978-981-99-8438-1_11
Download citation
DOI: https://doi.org/10.1007/978-981-99-8438-1_11
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-8437-4
Online ISBN: 978-981-99-8438-1
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)