An Automatic Text Summary Method Based on LDA Model
Document automatic summarization technology is a method that refines documents and generates summaries representing the whole document to help people quickly extract important information. Aiming at solving lack of semantic information in document abstracts, this paper proposed a weighted hybrid document summary model based on LDA. This model obtains the theme distribution probability through analysing the document. Firstly, we used the FCNNM (Fine-grained Convolutional Neural Network Model) extract the semantic features, then search the surface information of the text from heuristic rules, including the length, location of the sentence and TF-IDF of the words in the sentence, and weighted to calculate the sentence score. Finally, used the greedy algorithm to select the sentence to form the abstract. Experiments show that the proposed model can effectively compensate for the lack of semantics between abstract sentences and text in traditional algorithms, effectively reduce the high redundancy in document abstracts and improve the quality of abstracts.
This research is supported by National Key Research and Development Scheme of China under grant number 2017YFC1405403, and National Natural Science Foundation of China under grant number 61075059, and Green Industry Technology Leading Project (product development category) of Hubei University of Technology under grant number CPYF2017008, and Philosophical and Social Sciences Research Project of Hubei Education Department under Grant 19Q054.
- 6.Liu, N., Tang, X.J., Lu, Y., et al.: Topic-sensitive multi-document summarization algorithm. In: 2014 Sixth International Symposium on Parallel Architectures, Algorithms and Programming, pp. 69–74. IEEE (2014)Google Scholar
- 8.Hofmann, T.: Probabilistic latent semantic analysis. In: Proceedings of the Fifteenth Conference on Uncertainty in Artificial Intelligence, pp. 289–296. Morgan Kaufmann Publishers Inc. (1999)Google Scholar
- 9.Hu, B., Chen, Q., Zhu, F.: LCSTS: a large scale Chinese short text summarization dataset (2015). arXiv preprint arXiv:1506.05865
- 12.Devlin, J., Chang, M.W., Lee, K., et al.: BERT: pre-training of deep bidirectional transformers for language understanding (2018). arXiv preprint arXiv:1810.04805