A Modified Approach of Hot Topics Found on Micro-blog
Due to the simplicity, immediacy and convenience, micro-blog is gaining more and more attention from all kinds of people, especially the researchers. Recently, topic detection on micro-blog has attracted more interests due largely to the rapid development of micro-blog. However, retrieving information from micro-blog is challenging, as the texts of the micro-blog are short, ungrammatical, and unstructured, and they are full of noise. Therefore, the traditional hot topic detection method performed less. In order to solve this problem, this paper proposed a method of hot topics found based on speed growth. In this method, the pretreated micro-blogs were divided into different windows, and the time information was extracted in each window; then, for each word, it was expressed as feature trajectory of binary group sequence; then, calculated the growth speed of the word and the users relevant to the word in every adjacent two windows, selected the words whose growth speed is greater than a certain threshold as hot keywords; then, hot topics were found through the hot keywords clustering. The experiment was done based on SINA micro-blog dataset, the miss rate and false detection rate were done to prove the feasibility of the algorithm, results showed that the method improved the efficiency of the detection to a certain extent.
KeywordsTime information Growth speed Feature trajectory of binary group sequence
This research was supported by the National Natural Science Foundation of China (No. 60873247), the Natural Science Foundation of Shandong Province of China (No. ZR2009GZ007, ZR2011FM030), and National Social Science foundation of China (12BXW040).
- 1.Shi L, Zhang C, Wei L (2012) A mechanism of micro-blog users ranking introduced active index. J Chin Comput Syst 33(5):110–114Google Scholar
- 2.Zheng FR, Miao DQ, Zhang ZF, Gao C (2012) A method of topic detection on Chinese micro-blog. J Comput Sci 39(1):138–141Google Scholar
- 3.James A, Courtney W, Alvaro B (2003) Retrieval and novelty detection at the sentence level. In: SIGIR, pp 314–321Google Scholar
- 4.Yang GC (2011) Research of hot topic discovery strategy on microblogging platforms. Zhejiang UniversityGoogle Scholar
- 5.Giridhar K, James A (2004) Text classification and named entities for new event detection. In: SIGIR, pp 297–304Google Scholar
- 6.Mario C, Luiqi DC, Claudio S (2010) Emerging topic detection on Twitter based on temporal and social terms evaluation. In: MDMKDD 10 proceedings of the tenth International workshop on multimedia data mining. Washington, pp 1–10Google Scholar
- 7.Swit P, Tsuyoshi M (2010) Breaking news detection and tracking in Twitter. In: Web intelligence and intelligent agent technology (WI-IAT), 2010 IEEE/WIC/ACM international conference on Toronto, ON, pp 120–123Google Scholar
- 8.Qi H, Chang K, Lim E-P (2007) Analyzing feature trajectories for event detection. In: Proceedings of the 30th annual international ACM SIGIR conference, pp 207–214Google Scholar
- 9.Nish P, Neel S (2008) Scalable and near real-time burst detection from ecommerce queries. In: Proceedings of the 14th ACM SIGKDD international conference on knowledge discovery and data mining, pp 972–980Google Scholar