An Automatic Text Summary Method Based on LDA Model

  • Caiquan Xiong
  • Li ShenEmail author
  • Zhuang Wang
Conference paper
Part of the Lecture Notes in Networks and Systems book series (LNNS, volume 96)


Document automatic summarization technology is a method that refines documents and generates summaries representing the whole document to help people quickly extract important information. Aiming at solving lack of semantic information in document abstracts, this paper proposed a weighted hybrid document summary model based on LDA. This model obtains the theme distribution probability through analysing the document. Firstly, we used the FCNNM (Fine-grained Convolutional Neural Network Model) extract the semantic features, then search the surface information of the text from heuristic rules, including the length, location of the sentence and TF-IDF of the words in the sentence, and weighted to calculate the sentence score. Finally, used the greedy algorithm to select the sentence to form the abstract. Experiments show that the proposed model can effectively compensate for the lack of semantics between abstract sentences and text in traditional algorithms, effectively reduce the high redundancy in document abstracts and improve the quality of abstracts.



This research is supported by National Key Research and Development Scheme of China under grant number 2017YFC1405403, and National Natural Science Foundation of China under grant number 61075059, and Green Industry Technology Leading Project (product development category) of Hubei University of Technology under grant number CPYF2017008, and Philosophical and Social Sciences Research Project of Hubei Education Department under Grant 19Q054.


  1. 1.
    Luhn, H.P.: The automatic creation of literature abstracts. IBM J. Res. Dev. 2(4), 159–165 (1958)MathSciNetCrossRefGoogle Scholar
  2. 2.
    Edmundson, H.P.: New methods in automatic extracting. J. ACM (JACM) 16(2), 264–285 (1969)CrossRefGoogle Scholar
  3. 3.
    Zhong, S., Liu, Y., Li, B., et al.: Query-oriented unsupervised multi-document summarization via deep learning model. Expert Syst. Appl. 42(21), 8146–8155 (2015)CrossRefGoogle Scholar
  4. 4.
    Xiong, C., Li, X., Li, Y., et al.: Multi-documents summarization based on TextRank and its application in online argumentation platform. Int. J. Data Warehous. Min. (IJDWM) 14(3), 69–89 (2018)CrossRefGoogle Scholar
  5. 5.
    Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet allocation. J. Mach. Learn. Res. 3(Jan), 993–1022 (2003)zbMATHGoogle Scholar
  6. 6.
    Liu, N., Tang, X.J., Lu, Y., et al.: Topic-sensitive multi-document summarization algorithm. In: 2014 Sixth International Symposium on Parallel Architectures, Algorithms and Programming, pp. 69–74. IEEE (2014)Google Scholar
  7. 7.
    Yang, C.Z., Fan, J.S., Liu, Y.F.: Multi-document summarization using probabilistic topic-based network models. J. Inf. Sci. Eng. 32(6), 1613–1634 (2016)MathSciNetGoogle Scholar
  8. 8.
    Hofmann, T.: Probabilistic latent semantic analysis. In: Proceedings of the Fifteenth Conference on Uncertainty in Artificial Intelligence, pp. 289–296. Morgan Kaufmann Publishers Inc. (1999)Google Scholar
  9. 9.
    Hu, B., Chen, Q., Zhu, F.: LCSTS: a large scale Chinese short text summarization dataset (2015). arXiv preprint arXiv:1506.05865
  10. 10.
    Momtazi, S.: Unsupervised latent Dirichlet allocation for supervised question classification. Inf. Process. Manag. 54(3), 380–393 (2018)CrossRefGoogle Scholar
  11. 11.
    Agarwal, B., Ramampiaro, H., Langseth, H., et al.: A deep network model for paraphrase detection in short text messages. Inf. Process. Manag. 54(6), 922–937 (2018)CrossRefGoogle Scholar
  12. 12.
    Devlin, J., Chang, M.W., Lee, K., et al.: BERT: pre-training of deep bidirectional transformers for language understanding (2018). arXiv preprint arXiv:1810.04805

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  1. 1.School of Computer ScienceHubei University of TechnologyWuhanChina

Personalised recommendations