Research on multi-feature fusion algorithm for subject words extraction and summary generation of text

Xu, Gui-Xian; Yao, Hai-Shen; Wang, Changzhi

doi:10.1007/s10586-017-1219-3

Research on multi-feature fusion algorithm for subject words extraction and summary generation of text

Published: 16 October 2017

Volume 22, pages 10883–10895, (2019)
Cite this article

Cluster Computing Aims and scope Submit manuscript

306 Accesses
1 Citation
Explore all metrics

Abstract

Subject words represent the brief information of the text. Text automatic summary reflects its theme and core content. In this paper, the research is conducted on multi-feature fusion algorithm on subject words extraction and summary generation of Tibetan network text. Firstly, Tibetan web pages are collected and preprocessing is conducted to extract the useful information from web pages. Secondly, BCCF algorithm of word segmentation is utilized to cut the text’s words. Then multi-feature fusion algorithm is proposed to extract the subject words of the text. The algorithm takes into account the multi-factors such as the word’s frequency, length, type to calculate the words’ weight and effectively select the text’s subject words. For text summary generation, the algorithm of the sentence weight calculation is designed in terms of the word frequency, position and so on. The algorithm of text summary generation is to compute the sentences’ weight, remove the redundant sentences and form the text summary. The experiments show that multi-feature fusion algorithm of the subject words extraction and the summary generation have reached the better achievement. The research is useful and helpful to the study of Tibetan information processing.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Extractive Summarization of Text Using Weighted Average of Feature Scores

Comparative Analysis of Hindi Text Summarization for Multiple Documents by Padding of Ancillary Features

Metaheuristic Optimization Using Sentence Level Semantics for Extractive Document Summarization

References

Hu, X., Lin, Y., Wang, C., et al.: Summary of automatic text summarization techniques. J. Intell. 29(08), 144–147 (2010)
Google Scholar
Hu, C., Luo, N., Zhao, Q.: Fast fuzzy trajectory clustering strategy based on data summarization and rough approximation. Clust. Comput. 19(3), 1–10 (2016)
Article Google Scholar
Ohsawa, Y., Benson, N.E., Yachida, M.: KeyGraph: automatic indexing by co-occurrence graph based on building construction metaphor. In: Proceedings of the Research and Technology Advances in Digital Libraries, pp. 12–18 (1998)
Manning, C., Raghavan, P., Schtze, H.: Introduction to Information Retrieval. Cambridge University Press, New York (2008)
Book Google Scholar
Mihalcea, R., Tarau, P.: TextRank: bringing order into texts. Proceedings of EMNLP, pp. 404–411 (2004)
Si, X., Sun, M.: Tag-LDA for scalable real-time tag recommendation. J. Comput. Inf. Syst. 6(2), 23–31 (2009)
Google Scholar
Krestel, R., Fankhauser, P., Nejdl, W.: Latent Dirichlet allocation for tag recommendation. In: Proceedings of ACM Conference on Recommender Systems, pp. 61–68 (2009)
Bundschus, M., Yu, S., Tresp, V, et al.: Hierarchical Bayesian models for collaborative tagging systems. In: Proceedings of ICDM, pp. 728–733 (2009)
State Administration of Press, Publication, Radio, Film, and Television of The People’s Republic of China: Rules for Abstracts and Abstracting (GB6447-86). Standards Press of China Press, Beijing, pp. 141–142 (1998)
Ge, J.Y.: Research on Text Automatic Summarization Technology. Fudan University (2004)
Jin, B., Shi, Y.J., Teng, H.F., et al.: Automatic abstracting technology and its application. Appl. Res. Comput. 12, 13–15 (2004)
Google Scholar
Luhn, H.P.: The automatic creation of literature abstracts. IBM J. Res. Dev. 2(2), 159–165 (1958)
Article MathSciNet Google Scholar
Baxendale, P.: Machine-made index for technical literatur—an experiment. IBM J. Res. Dev. 2(4), 354–361 (1958)
Article Google Scholar
Aone, C., Okurowski, M.E., Gorlinsky, J., et al.: A trainable summarizer with knowledge acquired from robust NLP techniques. In: Mani, I., Maybury, M.T. (eds.) Advances in Automatic text Summarization, pp. 71–80. MIT Press, Cambridge (1999)
Google Scholar
Lin, C.Y.: Training a selection function for extraction. In: Eighth International Conference on Information and Knowledge Management. ACM, pp. 55-62 (1999)
Conroy, J.M., O’Leary, D.P.: Text summarization via hidden Markov models. In: International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, pp. 406-407 (2001)
Su, H.Y., Wang, Y.C.: The automatic creation of the abstracts of Chinese scientific and technical literature. J. China Soc. Sci. Tech. Inf. 8, 433–439 (1989)
Google Scholar
Mo, Y., Wang, Y.C.: Automatic abstract of Chinese documents. New Technol. Libr. Inf. Serv. 3, 10–12 (1999)
Google Scholar
Wang, Y.C., Xu, H.M.: The OA-1.4 automatic abstraction system on Chinese documents. High Technol. Lett. 1, 19–23 (1998)
Google Scholar
Wu, Y.: HIT-97 type English automatic abstracting system. J. China Soc. Sci. Tech. Inf. 17(5), 358–364 (1998)
Google Scholar
An-JianCaiRang: Research on automatic abstract of web document summarization of Tibetan search engine. Microprocessors 31(5), 77–80 (2010)
Yang, D.Z., Zhao, G., Wang, T.: Application of WebCrawler in information search and data mining. Comput. Eng. Des. 30(24), 5658–5662 (2009)
Google Scholar
Swaraj, K.P., Manjula, D.: A fast approach to identify trending articles in hot topics from XML based big bibliographic datasets. Clust. Comput. 19(2), 837–848 (2016)
Article Google Scholar
Jiang, D.: The method and process of the definition to grammatical chunks in modern Tibetan. Minor. Lang. China 04, 30–39 (2003)
Google Scholar
Chen, Y.Z., Li, B.L., et al.: An automatic Tibetan segmentation scheme based on case-auxiliary words and continuous features. Appl. Linguist. 01, 75–82 (2003)
Google Scholar
He, X.Z., Li, Y.C., Ma, N., Yu, H.Z.: Study on Tibetan automatic word segmentation as syllable tagging. Appl. Res. Comput. 32(7), 1989–1991 (2015)
Google Scholar
Zhu, J., Li, T.R.: Research on Tibetan stop words selection and automatic processing method. J. Chin. Inf. Process. 29(2), 125–132 (2015)
Google Scholar
Powers, D.M.W.: Applications and explanations of Zipf’s law. Adv. Neural Inf. Process. Syst. 5(4), 595–599 (1998)
Google Scholar

Download references

Acknowledgements

This work was supported by the Beijing Social Science Foundation (No. 14WYB040), First class university, First class discipline construction funds of Minzu University of China (No.2017MDYL12), the National Key Technology Research and Development Program of the Ministry of Science and Technology of China (No. 2014BAK10B03), and the National Natural Science Foundation of China (Nos. 61309012 and 61331013).

Author information

Authors and Affiliations

Information Engineering College, Minzu University of China, Beijing, China
Gui-Xian Xu, Hai-Shen Yao & Changzhi Wang

Authors

Gui-Xian Xu
View author publications
You can also search for this author in PubMed Google Scholar
Hai-Shen Yao
View author publications
You can also search for this author in PubMed Google Scholar
Changzhi Wang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Gui-Xian Xu.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Xu, GX., Yao, HS. & Wang, C. Research on multi-feature fusion algorithm for subject words extraction and summary generation of text. Cluster Comput 22 (Suppl 5), 10883–10895 (2019). https://doi.org/10.1007/s10586-017-1219-3

Download citation

Received: 17 August 2017
Revised: 15 September 2017
Accepted: 22 September 2017
Published: 16 October 2017
Issue Date: September 2019
DOI: https://doi.org/10.1007/s10586-017-1219-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Research on multi-feature fusion algorithm for subject words extraction and summary generation of text

Abstract

Access this article

Similar content being viewed by others

Extractive Summarization of Text Using Weighted Average of Feature Scores

Comparative Analysis of Hindi Text Summarization for Multiple Documents by Padding of Ancillary Features

Metaheuristic Optimization Using Sentence Level Semantics for Extractive Document Summarization

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Research on multi-feature fusion algorithm for subject words extraction and summary generation of text

Abstract

Access this article

Similar content being viewed by others

Extractive Summarization of Text Using Weighted Average of Feature Scores

Comparative Analysis of Hindi Text Summarization for Multiple Documents by Padding of Ancillary Features

Metaheuristic Optimization Using Sentence Level Semantics for Extractive Document Summarization

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation