Abstract
Topic modeling is the statistical model for discovering hidden topics or keywords in a collection of documents. Topic modeling is also considered a probabilistic model for learning, analyzing, and discovering topics from the document collection. The most popular techniques for topic modeling are latent semantic analysis (LSA), probabilistic latent semantic analysis (pLSA), latent Dirichlet allocation (LDA), and the recent deep learning-based lda2vec. LDA is most commonly used in extractive multi-document summarization to determine whether the extracted sentence reflects the concept of the input document. In this paper, we will try to explore various multi-document summarization techniques that use LDA as a topic modeling method for improving final summary coverage and to reduce redundancy. Finally, we compared LDA and LSA using the Genism toolkit, and our experiment results show that LDA outperforms LSA if we increase the number of features considered for sentence selection.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Blei DM, Ng AY et al (2003) Latent Dirichlet allocation. J Mach Learn Res 3(2003):993–1022
Blei DM, Ng AY, Jordan MI (2003) Latent Dirichelt allocation. J Mach Learn Res 3:993–1022
Jelodar h, Wang Y et al (2018) Latent Dirichlet allocation (LDA) and topic modeling: models, applications, a survey. arXiv:1711.04305v2 [cs.IR]
Vayansky I, Kumar SAP (2020) A review of topic modeling methods. Inf Syst PII S0306–4379(20):30070–30073
Vayansky I, Kumar SAP (2020) A review of topic modeling methods. Inf Syst. https://doi.org/10.1016/j.is.2020.101582
Albalawi R, Yeap TH, Benyoucef M (2020) Using topic modeling methods for short-text data: a comparative analysis. Front Artif Intell
Kherwa P, Bansa P (2019) Topic modeling: a comprehensive review. EAI Endorsed Trans Scalable Inf Syst, 10 2019–01 2020
Belwal RC, Rai S, Gupta A (2020) A new graph‑based extractive text summarization using keywords or topic modeling. J Ambient Intell Human Comput
Uma Shankari E, Krishna Rao NV et al (2020) Multi-document text summarization using genism. Int J Adv Sci Technol 29(12):1362–1370
Lisjana OA, Rini DP, Yusliani N (2019) Multi-document text summarization based on semantic clustering and selection of representative sentences using latent Dirichlet allocation. Adv Intell Syst Res, 172
Roul RK (2020) Topic modeling combined with classification technique for extractive multi-document text summarization. Springer-Verlag GmbH Germany, part of Springer Nature
Das SJ, Murakami R, Chakraborty B (2020) Development of a two-step LDA based aspect extraction technique for review summarization. Int J Appl Sci Eng 18(1):2020120K
Wang B, Zou Y et al (2018) Multi-document summarization via LDA and density peaks based sentence-level clustering. CCIS 873:313–323
Chiney RP, Prasanna Kumar R (2020) Extractive summarization approach for news articles based on selective features. Int J Adv Sci Technol 29(6):8215–8224
Alambo A, Lohstroh C, Madaus E et al (2020) Topic-Centric unsupervised multi-document summarization of scientific and news articles, arXiv.org. cs, arXiv:2011.08072
Zhong Y, Tang Z, Ding X, Zhu L, Le Y (2017) An improved LDA multi-document summarization model based on tensorflow. In: International conference on tools with artificial intelligence 2375-0197/17
Dan T, Yu S (2020) Multi-feature automatic abstract based on LDA model and redundant control. J Phys Conf Ser 1693:012211
Batura TV, Bakiyeva AM, Charintseva MV (2020) A method for automatic text summarization based on rhetorical analysis and topic modeling. Int J Comput 19(1):118–127
Shaik TA, Vikas A, Pradyumna GVN (2020) Hybrid approach for text summarization with topic modelling and entity extraction. Int Res J Eng Technol. p-ISSN: 2395-0072
Mab Shiva Kumar K, Soumya R, Text summarization using clustering technique and SVM technique. Int J Appl Eng Res 10:25511–25519, 201
Wang L, Yao J et al (2018) A reinforced topic-aware convolutional sequence-to-sequence model for abstractive text summarization. In: Proceedings of the twenty-seventh international joint conference on artificial intelligence (IJCAI-18)
Twinandillaa S, Adhya S, Surarsob B, Kusumaningruma R (2018) Multi-document summarization using K-means and latent Dirichlet allocation (LDA)—significance sentences. In: International conference on computer science and computational intelligence
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Mohan, G.B., Kumar, R.P. (2022). A Comprehensive Survey on Topic Modeling in Text Summarization. In: Sharma, D.K., Peng, SL., Sharma, R., Zaitsev, D.A. (eds) Micro-Electronics and Telecommunication Engineering . ICMETE 2021. Lecture Notes in Networks and Systems, vol 373. Springer, Singapore. https://doi.org/10.1007/978-981-16-8721-1_22
Download citation
DOI: https://doi.org/10.1007/978-981-16-8721-1_22
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-16-8720-4
Online ISBN: 978-981-16-8721-1
eBook Packages: EngineeringEngineering (R0)