Abstract
Microblogging is a recent social phenomenon of Web2.0 technology, having applications in many domains. It is another form of social media, recognized as Real-Time Web Publishing, which has won an impressive audience acceptance and surprisingly changed online expression and interaction for millions of users.It is observed that clustering by topic can be very helpful for the quick retrieval of desired information. We propose a novel topic detection technique that permits to retrieve in real-time the most emergent topics expressed by the community. Traditional text mining techniques have no special considerations for short and sparse microblog data. Keeping in view these special characteristics of data, we adopt Single-pass Clustering technique by using Latent Dirichlet Allocation (LDA) Model in place of traditional VSM model, to extract the hidden microblog topics information. Experiments on actual dataset results showed that the proposed method decreased the probabilities of miss and false alarm, as well as reduced the normalized detection cost.
This work is partially supported by the National Science Foundation of China (Nos. 61170111 , 61003142 and 61152001) and the Fundamental Research Funds for the Central Universities (No. SWJTU11ZT08).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Salton, G., Wong, A., Yang, C.S.: A vector space model for automatic indexing. Communications of the ACM 18, 613–620 (1995)
Yang, Y., Pierce, T., Carbonell, J.: A study on Retro-spective and On-Line Event detection. In: Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, USA, pp. 28–36 (1998)
Trieschnigg, D., Kraaij, W.: TNO hierarchical topic detection report at TDT 2004. In: The 7th Topic Detection and Tracking Conf. (2004)
Papka, R., Allan, J.: On Line New Event Detection using Single Pass Clustering. UMass Computer Science (1998)
Cataldi, L., Caro, D., Schifanella, C.: Emerging Topic Detection on Twitter based on Temporal and Social Terms Evaluation. In: MDMKDD 2010 Proceedings of the Tenth International Workshop on Multimedia Data Mining, Washington, pp. 1–10 (2010)
Phuvipadawat, S., Murata, T.: Breaking News Detection and Tracking in Twitter. In: 2010 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology (WI-IAT), Toronto, pp. 120–123 (2010)
Blei, D., Ng, A., Jordan, M., et al.: Latent dirichlet allocation. Journal of Machine Learning Research 3, 993–1022 (2003)
Stuart, G., Donald, G.: Stochastic relaxation gibbs distributions and the bayesian restoration of images. IEEE Transactions on Pattern Analysis and Machine Intelligence 6, 7212–7411 (1984)
Griffiths, T.L., Steyvers, M.: Finding scientific topics. Proceedings of the National Academy of Science 101, 5228–5235 (2004)
The Linguistic Data Consortium.: The 2004 Topic Detection and Tracking. Task Definition and Evaluation Plan (2004), http://www.itl.nist.gov/iad/mig/tests/tdt/2004/TDT04.Eval.Plan.v1.2.compare.1.1c
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Huang, B., Yang, Y., Mahmood, A., Wang, H. (2012). Microblog Topic Detection Based on LDA Model and Single-Pass Clustering. In: Yao, J., et al. Rough Sets and Current Trends in Computing. RSCTC 2012. Lecture Notes in Computer Science(), vol 7413. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-32115-3_19
Download citation
DOI: https://doi.org/10.1007/978-3-642-32115-3_19
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-32114-6
Online ISBN: 978-3-642-32115-3
eBook Packages: Computer ScienceComputer Science (R0)