Skip to main content

A Comprehensive Survey on Topic Modeling in Text Summarization

  • Conference paper
  • First Online:
Micro-Electronics and Telecommunication Engineering (ICMETE 2021)

Abstract

Topic modeling is the statistical model for discovering hidden topics or keywords in a collection of documents. Topic modeling is also considered a probabilistic model for learning, analyzing, and discovering topics from the document collection. The most popular techniques for topic modeling are latent semantic analysis (LSA), probabilistic latent semantic analysis (pLSA), latent Dirichlet allocation (LDA), and the recent deep learning-based lda2vec. LDA is most commonly used in extractive multi-document summarization to determine whether the extracted sentence reflects the concept of the input document. In this paper, we will try to explore various multi-document summarization techniques that use LDA as a topic modeling method for improving final summary coverage and to reduce redundancy. Finally, we compared LDA and LSA using the Genism toolkit, and our experiment results show that LDA outperforms LSA if we increase the number of features considered for sentence selection.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 189.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 249.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Blei DM, Ng AY et al (2003) Latent Dirichlet allocation. J Mach Learn Res 3(2003):993–1022

    MATH  Google Scholar 

  2. Blei DM, Ng AY, Jordan MI (2003) Latent Dirichelt allocation. J Mach Learn Res 3:993–1022

    MATH  Google Scholar 

  3. Jelodar h, Wang Y et al (2018) Latent Dirichlet allocation (LDA) and topic modeling: models, applications, a survey. arXiv:1711.04305v2 [cs.IR]

  4. Vayansky I, Kumar SAP (2020) A review of topic modeling methods. Inf Syst PII S0306–4379(20):30070–30073

    Google Scholar 

  5. Vayansky I, Kumar SAP (2020) A review of topic modeling methods. Inf Syst. https://doi.org/10.1016/j.is.2020.101582

    Article  Google Scholar 

  6. Albalawi R, Yeap TH, Benyoucef M (2020) Using topic modeling methods for short-text data: a comparative analysis. Front Artif Intell

    Google Scholar 

  7. Kherwa P, Bansa P (2019) Topic modeling: a comprehensive review. EAI Endorsed Trans Scalable Inf Syst, 10 2019–01 2020

    Google Scholar 

  8. Belwal RC, Rai S, Gupta A (2020) A new graph‑based extractive text summarization using keywords or topic modeling. J Ambient Intell Human Comput

    Google Scholar 

  9. Uma Shankari E, Krishna Rao NV et al (2020) Multi-document text summarization using genism. Int J Adv Sci Technol 29(12):1362–1370

    Google Scholar 

  10. Lisjana OA, Rini DP, Yusliani N (2019) Multi-document text summarization based on semantic clustering and selection of representative sentences using latent Dirichlet allocation. Adv Intell Syst Res, 172

    Google Scholar 

  11. Roul RK (2020) Topic modeling combined with classification technique for extractive multi-document text summarization. Springer-Verlag GmbH Germany, part of Springer Nature

    Google Scholar 

  12. Das SJ, Murakami R, Chakraborty B (2020) Development of a two-step LDA based aspect extraction technique for review summarization. Int J Appl Sci Eng 18(1):2020120K

    Google Scholar 

  13. Wang B, Zou Y et al (2018) Multi-document summarization via LDA and density peaks based sentence-level clustering. CCIS 873:313–323

    Google Scholar 

  14. Chiney RP, Prasanna Kumar R (2020) Extractive summarization approach for news articles based on selective features. Int J Adv Sci Technol 29(6):8215–8224

    Google Scholar 

  15. Alambo A, Lohstroh C, Madaus E et al (2020) Topic-Centric unsupervised multi-document summarization of scientific and news articles, arXiv.org. cs, arXiv:2011.08072

  16. Zhong Y, Tang Z, Ding X, Zhu L, Le Y (2017) An improved LDA multi-document summarization model based on tensorflow. In: International conference on tools with artificial intelligence 2375-0197/17

    Google Scholar 

  17. Dan T, Yu S (2020) Multi-feature automatic abstract based on LDA model and redundant control. J Phys Conf Ser 1693:012211

    Google Scholar 

  18. Batura TV, Bakiyeva AM, Charintseva MV (2020) A method for automatic text summarization based on rhetorical analysis and topic modeling. Int J Comput 19(1):118–127

    Google Scholar 

  19. Shaik TA, Vikas A, Pradyumna GVN (2020) Hybrid approach for text summarization with topic modelling and entity extraction. Int Res J Eng Technol. p-ISSN: 2395-0072

    Google Scholar 

  20. Mab Shiva Kumar K, Soumya R, Text summarization using clustering technique and SVM technique. Int J Appl Eng Res 10:25511–25519, 201

    Google Scholar 

  21. Wang L, Yao J et al (2018) A reinforced topic-aware convolutional sequence-to-sequence model for abstractive text summarization. In: Proceedings of the twenty-seventh international joint conference on artificial intelligence (IJCAI-18)

    Google Scholar 

  22. Twinandillaa S, Adhya S, Surarsob B, Kusumaningruma R (2018) Multi-document summarization using K-means and latent Dirichlet allocation (LDA)—significance sentences. In: International conference on computer science and computational intelligence

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to G. Bharathi Mohan .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Mohan, G.B., Kumar, R.P. (2022). A Comprehensive Survey on Topic Modeling in Text Summarization. In: Sharma, D.K., Peng, SL., Sharma, R., Zaitsev, D.A. (eds) Micro-Electronics and Telecommunication Engineering . ICMETE 2021. Lecture Notes in Networks and Systems, vol 373. Springer, Singapore. https://doi.org/10.1007/978-981-16-8721-1_22

Download citation

  • DOI: https://doi.org/10.1007/978-981-16-8721-1_22

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-16-8720-4

  • Online ISBN: 978-981-16-8721-1

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics