Advertisement

Neural Computing and Applications

, Volume 32, Issue 1, pp 73–83 | Cite as

Research on topic discovery technology for Web news

  • Guixian XuEmail author
  • Ziheng Yu
  • Changzhi Wang
  • Antai Wang
S.I. : Brain- Inspired computing and Machine learning for Brain Health
  • 122 Downloads

Abstract

With the development of information technology, Web news has become the main way of information dissemination. Web news topic discovery is useful for users to quickly find valuable information and its research is constantly improved. Traditional topic discovery research is based on vector space model, but it has the defects such as high dimension and data sparsity. However, the latent semantic analysis can map the high-dimensional and sparse words to k-dimensional semantic space and improve the similarity of the news of the same topic by the semantic correlation between words. In this paper, Web news topic discovery is studied. First, the set of Web news text is vectored and the weight of each feature in the texts is calculated by improved TFIDF. After the original text vector set is analysed by latent semantic analysis, the semantic relation is fully exploited between the texts and the words, and the news topics are extracted by clustering approach. For the extraction of sub-topics, the co-occurrence of words is used to display the sub-topics. In essence, the sub-topic vector is established through these co-occurrence words. The experimental results show that the proposed method can effectively capture the current hot topics of Web news and related sub-topics. It is meaningful for the technology of information retrieval and data mining.

Keywords

Topic discovery Weight computation Latent semantic analysis Similarity 

Notes

Acknowledgements

This work was supported by the Ministry of education of Humanities and Social Science project, the Beijing Social Science Foundation (Grant: 14WYB040) and MUC 111 project.

References

  1. 1.
    Zhao XJ, Yang CH, Li B, Zhang H, Jin PQ, Yue LH, Dai WK (2014) A topic evolution mining algorithm of news text based on feature evolving. Chin J Comput 37(4):819–832Google Scholar
  2. 2.
    Rao YH (2016) Contextual sentiment topic model for adaptive social emotion classification. IEEE Intell Syst 31(1):41–47Google Scholar
  3. 3.
    Jiang XW, Wang JM, Ding GG (2013) detection and ranking of significant topics on Sina Weibo based on topic model. J Comput Res Dev 50(S1):179–185Google Scholar
  4. 4.
    Ye CX, Yang YP, Liu SP (2016) Hot microblogging topics discovery based on subject terms. Comput Appl Softw 33(2):46–50Google Scholar
  5. 5.
    Zhang XM, Li ZJ, Chao WH (2012) Research of automatic topic detection based on incremental clustering. J Softw 23(6):1578–1587Google Scholar
  6. 6.
    Swaraj KP, Manjula D (2016) A fast approach to identify trending articles in hot topics from XML based big bibliographic datasets. Clust Comput 19(2):837–848Google Scholar
  7. 7.
    Gromov VA, Konev AS (2017) Precocious identification of popular topics on Twitter with the employment of predictive clustering. Neural Comput Appl 28(11):3317–3322Google Scholar
  8. 8.
    Cui RF, Yu HT, Yang W, Li X (2014) A hot topic detection method based on comment tree in Weibo community. Appl Res Comput 31(12):3776–3779 + 3827Google Scholar
  9. 9.
    Cheng JX, Li ZW, Zou MG, Xiao J (2013) Study on microblog news topic detection based on SVM filtering. J Commun 34(S2):74–78Google Scholar
  10. 10.
    Fang R, Miao DQ, Zhang ZF (2013) An emotion method of topic detection from Chinese microblogs. CAAI Trans Intell Syst 8(3):208–213Google Scholar
  11. 11.
    Li FL, Zhu BP (2014) On lda-based microblogging topic dection. Comput Appl Softw 31(10):24–26+66Google Scholar
  12. 12.
    Xi YY, Lin C, Li BC, Zhou J, Xu XY (2011) Method for BBS topic tracking based on semantic similarity. J Comput Appl 1:93–96Google Scholar
  13. 13.
    Ren XD, Zhang YK, Xue XF (2009) Adaptive topic tracking technique based on k-modes clustering. Comput Eng 09:222–224Google Scholar
  14. 14.
    Geng CC, Du P, Liu Y, Cheng XQ (2016) A review of emerging topic detection techniques in online social networks. J Chin Inf Process 30(05):9–18Google Scholar
  15. 15.
    Aiello LM, Petkos G, Martin C et al (2013) Sensing trending topics in Twitter. IEEE Trans Multimed 15(6):1268–1282Google Scholar
  16. 16.
    Wu FZ, Huang YF, Song YQ (2016) Structured microblog sentiment classification via social context regularization. Neurocomputing 175:599–609Google Scholar
  17. 17.
    Nguyen DT, Jung JE (2017) Real-time event detection for online behavioral analysis of big social data. Future Gener Comput Syst 66:137–145Google Scholar
  18. 18.
    Chang P, Feng N, Ma H (2012) Document clustering algorithm based on word co-occurrence. Comput Eng 38(2):213–214 + 220Google Scholar
  19. 19.
    Jiang F, Liu YQ (2015) Microblog sentiment analysis with emoticon space model. J Comput Sci Technol 30(5):1120–1129Google Scholar
  20. 20.
    Zhang DW, Xu H, Su ZC, Xu YF (2015) Chinese comments sentiment classification based on word2vec and SVM. Expert Syst Appl 42:1857–1863Google Scholar
  21. 21.
    Evangelopoulos NE (2013) Latent semantic analysis. Wiley Interdiscip Rev Cognit Sci 4(6):683Google Scholar
  22. 22.
    Wu GM, Zhang YC, Han JY (2016) Online news topic extracting based on laten semantic analysis. Comput Technol Dev 26(9):1–2Google Scholar
  23. 23.
    Deerwester S, Dumais ST, Furnas GW et al (1990) Indexing by latent semantic analysis. J Assoc Inf Sci Technol 41(6):391–407Google Scholar
  24. 24.
    Ma WW, Wei WH, Deng YG (2014) Micro-blog topic detection method based on latent semantic analysis. Comput Eng Appl 50(1):96–100Google Scholar
  25. 25.
    Shi JH, Chen XS, Wang WX (2014) Discovering topic from Chinese microblog based on hidden topics analysis. Appl Res Comput 31(3):700–704Google Scholar
  26. 26.
    Zhou L, Zhang D (2003) NLPIR: a theoretical framework for applying natural language processing to information retrieval. J Am Soc Inf Sci Technol 54(2):115–123Google Scholar

Copyright information

© The Natural Computing Applications Forum 2018

Authors and Affiliations

  • Guixian Xu
    • 1
    Email author
  • Ziheng Yu
    • 1
  • Changzhi Wang
    • 1
  • Antai Wang
    • 2
  1. 1.College of Information EngineeringMinzu University of ChinaBeijingChina
  2. 2.Department of Mathematical SciencesNew Jersey Institute of TechnologyNewarkUSA

Personalised recommendations