Skip to main content
Log in

Research on multi-feature fusion algorithm for subject words extraction and summary generation of text

  • Published:
Cluster Computing Aims and scope Submit manuscript

Abstract

Subject words represent the brief information of the text. Text automatic summary reflects its theme and core content. In this paper, the research is conducted on multi-feature fusion algorithm on subject words extraction and summary generation of Tibetan network text. Firstly, Tibetan web pages are collected and preprocessing is conducted to extract the useful information from web pages. Secondly, BCCF algorithm of word segmentation is utilized to cut the text’s words. Then multi-feature fusion algorithm is proposed to extract the subject words of the text. The algorithm takes into account the multi-factors such as the word’s frequency, length, type to calculate the words’ weight and effectively select the text’s subject words. For text summary generation, the algorithm of the sentence weight calculation is designed in terms of the word frequency, position and so on. The algorithm of text summary generation is to compute the sentences’ weight, remove the redundant sentences and form the text summary. The experiments show that multi-feature fusion algorithm of the subject words extraction and the summary generation have reached the better achievement. The research is useful and helpful to the study of Tibetan information processing.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

References

  1. Hu, X., Lin, Y., Wang, C., et al.: Summary of automatic text summarization techniques. J. Intell. 29(08), 144–147 (2010)

    Google Scholar 

  2. Hu, C., Luo, N., Zhao, Q.: Fast fuzzy trajectory clustering strategy based on data summarization and rough approximation. Clust. Comput. 19(3), 1–10 (2016)

    Article  Google Scholar 

  3. Ohsawa, Y., Benson, N.E., Yachida, M.: KeyGraph: automatic indexing by co-occurrence graph based on building construction metaphor. In: Proceedings of the Research and Technology Advances in Digital Libraries, pp. 12–18 (1998)

  4. Manning, C., Raghavan, P., Schtze, H.: Introduction to Information Retrieval. Cambridge University Press, New York (2008)

    Book  Google Scholar 

  5. Mihalcea, R., Tarau, P.: TextRank: bringing order into texts. Proceedings of EMNLP, pp. 404–411 (2004)

  6. Si, X., Sun, M.: Tag-LDA for scalable real-time tag recommendation. J. Comput. Inf. Syst. 6(2), 23–31 (2009)

    Google Scholar 

  7. Krestel, R., Fankhauser, P., Nejdl, W.: Latent Dirichlet allocation for tag recommendation. In: Proceedings of ACM Conference on Recommender Systems, pp. 61–68 (2009)

  8. Bundschus, M., Yu, S., Tresp, V, et al.: Hierarchical Bayesian models for collaborative tagging systems. In: Proceedings of ICDM, pp. 728–733 (2009)

  9. State Administration of Press, Publication, Radio, Film, and Television of The People’s Republic of China: Rules for Abstracts and Abstracting (GB6447-86). Standards Press of China Press, Beijing, pp. 141–142 (1998)

  10. Ge, J.Y.: Research on Text Automatic Summarization Technology. Fudan University (2004)

  11. Jin, B., Shi, Y.J., Teng, H.F., et al.: Automatic abstracting technology and its application. Appl. Res. Comput. 12, 13–15 (2004)

    Google Scholar 

  12. Luhn, H.P.: The automatic creation of literature abstracts. IBM J. Res. Dev. 2(2), 159–165 (1958)

    Article  MathSciNet  Google Scholar 

  13. Baxendale, P.: Machine-made index for technical literatur—an experiment. IBM J. Res. Dev. 2(4), 354–361 (1958)

    Article  Google Scholar 

  14. Aone, C., Okurowski, M.E., Gorlinsky, J., et al.: A trainable summarizer with knowledge acquired from robust NLP techniques. In: Mani, I., Maybury, M.T. (eds.) Advances in Automatic text Summarization, pp. 71–80. MIT Press, Cambridge (1999)

    Google Scholar 

  15. Lin, C.Y.: Training a selection function for extraction. In: Eighth International Conference on Information and Knowledge Management. ACM, pp. 55-62 (1999)

  16. Conroy, J.M., O’Leary, D.P.: Text summarization via hidden Markov models. In: International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, pp. 406-407 (2001)

  17. Su, H.Y., Wang, Y.C.: The automatic creation of the abstracts of Chinese scientific and technical literature. J. China Soc. Sci. Tech. Inf. 8, 433–439 (1989)

    Google Scholar 

  18. Mo, Y., Wang, Y.C.: Automatic abstract of Chinese documents. New Technol. Libr. Inf. Serv. 3, 10–12 (1999)

    Google Scholar 

  19. Wang, Y.C., Xu, H.M.: The OA-1.4 automatic abstraction system on Chinese documents. High Technol. Lett. 1, 19–23 (1998)

    Google Scholar 

  20. Wu, Y.: HIT-97 type English automatic abstracting system. J. China Soc. Sci. Tech. Inf. 17(5), 358–364 (1998)

    Google Scholar 

  21. An-JianCaiRang: Research on automatic abstract of web document summarization of Tibetan search engine. Microprocessors 31(5), 77–80 (2010)

  22. Yang, D.Z., Zhao, G., Wang, T.: Application of WebCrawler in information search and data mining. Comput. Eng. Des. 30(24), 5658–5662 (2009)

    Google Scholar 

  23. Swaraj, K.P., Manjula, D.: A fast approach to identify trending articles in hot topics from XML based big bibliographic datasets. Clust. Comput. 19(2), 837–848 (2016)

    Article  Google Scholar 

  24. Jiang, D.: The method and process of the definition to grammatical chunks in modern Tibetan. Minor. Lang. China 04, 30–39 (2003)

    Google Scholar 

  25. Chen, Y.Z., Li, B.L., et al.: An automatic Tibetan segmentation scheme based on case-auxiliary words and continuous features. Appl. Linguist. 01, 75–82 (2003)

    Google Scholar 

  26. He, X.Z., Li, Y.C., Ma, N., Yu, H.Z.: Study on Tibetan automatic word segmentation as syllable tagging. Appl. Res. Comput. 32(7), 1989–1991 (2015)

    Google Scholar 

  27. Zhu, J., Li, T.R.: Research on Tibetan stop words selection and automatic processing method. J. Chin. Inf. Process. 29(2), 125–132 (2015)

    Google Scholar 

  28. Powers, D.M.W.: Applications and explanations of Zipf’s law. Adv. Neural Inf. Process. Syst. 5(4), 595–599 (1998)

    Google Scholar 

Download references

Acknowledgements

This work was supported by the Beijing Social Science Foundation (No. 14WYB040), First class university, First class discipline construction funds of Minzu University of China (No.2017MDYL12), the National Key Technology Research and Development Program of the Ministry of Science and Technology of China (No. 2014BAK10B03), and the National Natural Science Foundation of China (Nos. 61309012 and 61331013).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Gui-Xian Xu.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Xu, GX., Yao, HS. & Wang, C. Research on multi-feature fusion algorithm for subject words extraction and summary generation of text. Cluster Comput 22 (Suppl 5), 10883–10895 (2019). https://doi.org/10.1007/s10586-017-1219-3

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10586-017-1219-3

Keywords

Navigation