Abstract
Although topic detection and tracking techniques have made great progress, most of the researchers seldom pay more attention to the following two aspects. First, the construction of a topic model does not take the characteristics of different topics into consideration. Second, the factors that determine the formation and development of hot topics are not further analyzed. In order to correctly extract news blog hot topics, the paper views the above problems in a new perspective based on the W2T (Wisdom Web of Things) methodology, in which the characteristics of blog users, context of topic propagation and information granularity are investigated in a unified way. The motivations and features of blog users are first analyzed to understand the characteristics of news blog topics. Then the context of topic propagation is decomposed into the blog community, topic network and opinion network, respectively. Some important factors such as the user behavior pattern, opinion leader and network opinion are identified to track the development trends of news blog topics. Moreover, a blog hot topic detection algorithm is proposed, in which news blog hot topics are identified by measuring the duration, topic novelty, attention degree of users and topic growth. Experimental results show that the proposed method is feasible and effective. These results are also useful for further studying the formation mechanism of opinion leaders in blogspace.
Similar content being viewed by others
References
Agarwal, N., Liu, H., Tang, L.: Identifying the influential bloggers in a community. In: Proceedings of the International Conference on Web Search and Web Data Mining, pp. 207–217 (2008)
Akritidis, L., Katsaros, D., Bozanis, P.: Identifying the productive and influential bloggers in a community. IEEE Trans. Syst. Man Cybern. 41(5), 759–764 (2011)
Allan, J., Papka, R., Lavrenko, V.: On-line new event detection and tracking. In: Proceedings of the Twenty-First Annual International ACM SIGIR Conference, pp. 37–45 (1998)
Anderson, J.R., Schooler, L.J.: Reflections of the environment in memory. Psychol. Sci. 2(6), 396–408 (1991)
Balakrishnan, H., Deo, N.: Discovering communities in complex networks. In: Proceedings of the Forty-Fourth Annual Southeast Regional Conference, pp. 280–285 (2006)
Bansal, N., Chiang, F., Koudas, N., Wm, F.: Seeking stable clusters in the blogosphere. In: Proceedings of the Thirty-Third International Conference on Very Large Data Bases, pp. 806–817 (2007)
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)
Bodendorf, F., Kaiser, C.: Detecting opinion leaders and trends in online social networks. In: Proceedings of the Fourth International Conference on Digital Society, pp. 124–129 (2010)
Brants, T., Chen, F., Ioannis, T.: Topic-based document segmentation with probabilistic latent semantic analysis. In: Proceedings of the Eleventh International Conference on Information and Knowledge Management, pp. 211–218 (2002)
Cao, Y.Z., Shao, P.J., Li, L.Q.. Topic propagation model based on diffusion threshold in blog networks. In: Proceedings of 2011 International Conference on Business Computing and Global Information, pp. 539–542 (2011)
Chen, C.C., Chen, Y.T., Chen, M.C.: An aging theory for event life-cycle modeling. IEEE Trans. Syst. Man Cybern. 37(2), 237–248 (2007)
Chen, K.Y., Luesukprasert, L., Chou, S.C.T.: Hot topic extraction based on timeline analysis and multidimensional sentence modeling. IEEE Trans. Knowl. Data Eng. 19(8), 1016–1025 (2007)
Constantiou, L., Hoebel, N., Zicari, R.V.: How do framing strategies influence the user’s choice of content on the web. Concurrency Comput. Pract. Exper. 24(17), 2207–2220 (2012)
Dai, X.Y., Chen, Q.C., Wang, X.L., Xu, J.: Online topic detection and tracking of financial news based on hierarchical clustering. In: Proceedings of the Ninth International Conference on Machine Learning and Cybernetics, vol. 6, pp. 3341–3346 (2010)
Ding, F.: Research on information interaction and diffusion in internet communities. Beijing Jiaotong University, Beijing (2010)
Gong, H.J.: Research on automatic network hot topics detection. Central China Normal University, Wuhan (2008)
He, T.T., Qu, G.Z., Li, S.W., Tu, X.H., Zhong, Y., Ren, H.: Semi-automatic hot event detection. In: Proceedings of the Second International Conference on Advanced Data Mining and Applications, pp. 1008–1016 (2006)
Hong, Y., Zhang, Y., Fan, J.L., Liu, T., Li, S.: New event detection based on division comparison of subtopic. Chinese Journal of Computers 31(4), 687–695 (2008)
Huang, H.H., Kuo, Y.H.: Cross-lingual document representation and semantic similarity measure a fuzzy set and rough set based approach. IEEE Trans. Fuzzy Syst. 18(6), 1098–1111 (2010)
ICTCLAS. Home page: http://ictclas.org. Accessed 10 Mar 2011
Kilner, P.G., Hoadley, C.M.: Anonymity options and professional participation in an online community of practice. In: Proceedings of the 2005 Conference on Computer Support for Collaborative Learning, pp. 272–280 (2005)
Ku, L.W., Liang, Y.T., Chen, H.H.: Opinion extraction, summarization and tracking in news and blog corpora. In: Proceedings of AAAI-2006 Spring Symposium on Computational Approaches to Analyzing Weblogs, pp. 100–107 (2006)
Kumar, R., Novak, J., Raghavan, P.: On the bursty evolution of blogspace. World Wide Web 8(2), 159–178 (2005)
Li, J.J., Zhang, X.C., Weng, Y., Hu, C.J.: Blog hotness evaluation model based on text opinion analysis. In: Proceedings of the Eighth IEEE International Conference on Dependable, Autonomic and Secure Computing, pp. 235–240 (2009)
Li, Y.M., Lai, C.Y., Chen, C.W.: Discovering influencers for marketing in the blogosphere. Inf. Sci. 181(23), 5143–5157 (2011)
Lim, S.H., Kim, S.W., Park, S.J., Lee, J.H.: Determining content power users in a blog network: an approach and its applications. IEEE Trans. Syst. Man Cybern. 41(5), 853–862 (2011)
Liu, Y., Yu, X.H., An, A.J., Huang, X.J.: Riding the tide of sentiment change: sentiment analysis with evolving online reviews. World Wide Web. doi:10.1007/s11280-012-0177-1
Luo, H.: A study on the evolution of internet public opinion of social focused events. Huazhong University of Science and Technoloy, Wuhan (2011)
Ma, X.H., Li, L.: Why do people blog? exploration of motivations for blogging. In: Proceedings of the Second IEEE Symposium on Web Society, pp. 119–122 (2010)
Musial, K., Budka, M., Juszczyszyn, K.: Creation and growth of online social network how do social networks evolve? World Wide Web. doi:10.1007/s11280-012-0179-z
Musial, K., Kazienko, P.: Social networks on the internet. World Wide Web 16(1), 31–72 (2013)
Pan, X.: Opinion spreading models on complex network. Dalian University of Technology, Dalian (2010)
Qi, H.F.: Research on hot topic detection and event tracking in network public opinion. Harbin Engineering University, Harbin (2008)
Qiu, H.M.: The social network analysis of blogosphere. Harbin Institute of Technology, Harbin (2007)
Salton, G., Buckley, C.: Term-weighting approaches in automatic text retrieval. Information Processing & Management 24(5), 513–523 (1988)
Shi, J., Hu, M., Dai, G.Z.: Topic analysis of Chinese text based on small world model. Journal of Chinese Information Processing 21(3), 69–75 (2007)
Sina Blog Website. Home page: http://blog.sina.com.cn. Accessed 1 Feb 2012
Sogou Laboratory. Home page: http://www.sogou.com/labs/dl/c.html. Accessed 28 Oct 2009
Song, X.D., Chi, Y., Hino, K., Tseng, B.: Identifying opinion leaders in the blogosphere. In: Proceedings of the Sixteenth ACM Conference on Information and Knowledge Management, pp. 971–974 (2007)
Sun, W.J., Qiu, H.M.: A social network analysis on blogospheres. In: Proceedings of 2008 International Conference on Management Science and Engineering, pp. 1769–1773 (2008)
Wang, C.H., Zhang, M., Ma, S.P., Ru, L.Y.: Automatic online news issue construction in web environment. In: Proceedings of the Seventeenth International Conference on World Wide Web, pp. 457–466 (2008)
Wang, J.H.: Web-based verification on the representativeness of terms extracted from single short documents. In: Proceedings of 2011 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology, vol. 3, pp. 114–117 (2011)
Wang, Y., Xi, Y.H., Wang, L.: Mining the hottest topics on Chinese webpage based on the improved k-means partitioning. In: Proceedings of the Eighth International Conference on Machine Learning and Cybernetics, pp. 255–260 (2009)
Xie, G.H.: The research on the system of the affect of internet opinion leaders. Central China Normal University, Wuhan (2011)
Yang, C.C., Shi, X.D., Wei, C.H.: Discovering event evolution graphs from news corpora. IEEE Trans. Syst. Man Cybern. 39(4), 850–863 (2009)
Yao, J.J., Cui, B., Huang, Y.X.: Bursty event detection from collaborative tags. World Wide Web 15(2), 171–195 (2012)
Yao, J.T., Yao, Y.Y.: Information granulation for web based information retrieval support systems. In: Proceedings of the Society of Photo-Optical Instrumentation Engineers, vol. 5098, pp. 138–146 (2003)
Yao, Y.Y., Petty, S.: Multiple representations of web content for effective knowledge utilization. In: Proceedings of 2012 International Conference on Brain Informatics, pp. 338–347 (2012)
Yu, H.: Research on the opinion leaders of political BBS: an case study on Sino-Japan BBS of strong nation forum. Huazhong University of Science and Technology, Wuhan (2007)
Zhang, Y.: A study on the phenomenon of public-opinion-spreading through bulletin board system. Jilin University, Changchun (2011)
Zhang, Y.C., Liu, Y., Ding, F., Si, X.M.: The research on stability of diffusion and competition between online topics. Int. J. Mod. Phys. C 21(12), 1517–1529 (2010)
Zhao, J.: Web usage mining based on granularity computing. South China University of Technology, Guangzhou (2010)
Zhao, K., Kumar, A.: Who blogs what: understanding the publishing behavior of bloggers. World Wide Web. 10.1007/s11280-012-0167-3
Zhao, P., Cai, Q.S., Wang, Q.Y., Gen, H.T.: An automatic keyword extraction of Chinese document algorithm based on complex network features. Pattern Recognition and Artificial Intelligence 20(6), 827–831 (2007)
Zhong, N., Bradshaw, J.M., Liu, J.M., Taylor, J.G.: Brain informatics. IEEE Intell. Syst. 26(5), 16–21 (2011)
Zhong, N., Ma, J.H., Huang, R.H., Liu, J.M., Yao, Y.Y., Zhang, Y.X., Chen, J.H.: Research challenges and perspectives on Wisdom Web of Things (W2T). J. Supercomput. 1–21 (2010). doi:10.1007/s11227-010-0518-8
Zhou, Y.D., Sun, Q.D., Guan, X.H., Li, W., Tao, J.: Internet popular topics extraction of traffic content words correlation. Journal of Xian Jiaotong University 41(10), 1142–1145 (2007)
Zhu, M.X., Cai, Z., Cai, Q.S.: Automatic keywords extraction of Chinese document using small world structure. In: Proceedings of Natural Language Processing and Knowledge Engineering, pp. 438–443 (2003)
Zhu, T.: Research on node role and group evolution in social network. Beijing University of Posts and Telecommunications, Beijing (2011)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Zhou, E., Zhong, N. & Li, Y. Extracting news blog hot topics based on the W2T Methodology. World Wide Web 17, 377–404 (2014). https://doi.org/10.1007/s11280-013-0207-7
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11280-013-0207-7