Abstract
A method of realization of automatic abstracting based on text clustering and natural language understanding is explored, aimed at overcoming shortages of some current methods. The method makes use of text clustering and can realize automatic abstracting of multi-documents. The algorithm of twice word segmentation based on the title and first sentences in paragraphs is investigated. Its precision and recall is above 95 %. For a specific domain on plastics, an automatic abstracting system named TCAAS is implemented. The precision and recall of multi-document’s automatic abstracting is above 75%. Also, the experiments prove that it is feasible to use the method to develop a domain automatic abstracting system, which is valuable for further in-depth study.
Similar content being viewed by others
References
Califf M. E., Mooney R. J., Relational learning of pattern-match rules for information extraction, Proceedings of the 19th National Conference on Artificial Intelligence, 2003, 19(1): 87–90
Li Lei, Zhong Yi-xin, The application of comprehensive information theory in automatic abstract system, Chinese Journal of Computers, 2000, 23(1): 4–7 (in Chinese)
Terje Brasethvik, Jon Atle Gulla, Natural language analysis for semantic document modeling, Data and Knowledge Engineering, 2001, 38(1): 45–62
Brown P., Della Pietra V., Class-based n-gram models of natural language, Computational Linguistics, 2002, 28(4): 477–480
Liu Ting, Wang Kai-zhu, Four kinds of main methods of automatic abstracting, Journal Information, 1999, 18(1): 11–19 (in Chinese)
Wu Si, Cluster analysis and Its application in the automatic information extraction from agricultural texts, Xiang tan: Xiang Tan University Press, 2001, 22–28 (in Chinese)
Yao Tian-shun, Natural language understanding, Beijing: Tsinghua University Press, 2002: 98–101 (in Chinese)
Li Jin-qian, Zhang Dong-mo, Yao Tian-fang, The optimization of sentence structure in natural language generation, The Research of Computer Application, 1998, 19(1): 53–54 (in Chinese)
Liu Chang-yu, Tang Chang-jie, Bayes discriminator for BBS documents based on latent semantic analysis, Chinese Journal of Computers, 2004, 27(4): 567–568 (in Chinese)
Author information
Authors and Affiliations
Corresponding author
Additional information
__________
Translated from Transactions of Beijing Institute of Technology (Natural Science Edition), 2005, 25(8): 705–709 (in Chinese)
About this article
Cite this article
Guo, Ql., Fan, Xz. & Liu, Ca. The research and realization about automatic abstracting based on text clustering and natural language understanding. Front. Electr. Electron. Eng. China 1, 460–464 (2006). https://doi.org/10.1007/s11460-006-0088-y
Issue Date:
DOI: https://doi.org/10.1007/s11460-006-0088-y