Skip to main content

Microblog Topic Detection Based on LDA Model and Single-Pass Clustering

  • Conference paper
Rough Sets and Current Trends in Computing (RSCTC 2012)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 7413))

Included in the following conference series:

Abstract

Microblogging is a recent social phenomenon of Web2.0 technology, having applications in many domains. It is another form of social media, recognized as Real-Time Web Publishing, which has won an impressive audience acceptance and surprisingly changed online expression and interaction for millions of users.It is observed that clustering by topic can be very helpful for the quick retrieval of desired information. We propose a novel topic detection technique that permits to retrieve in real-time the most emergent topics expressed by the community. Traditional text mining techniques have no special considerations for short and sparse microblog data. Keeping in view these special characteristics of data, we adopt Single-pass Clustering technique by using Latent Dirichlet Allocation (LDA) Model in place of traditional VSM model, to extract the hidden microblog topics information. Experiments on actual dataset results showed that the proposed method decreased the probabilities of miss and false alarm, as well as reduced the normalized detection cost.

This work is partially supported by the National Science Foundation of China (Nos. 61170111 , 61003142 and 61152001) and the Fundamental Research Funds for the Central Universities (No. SWJTU11ZT08).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Salton, G., Wong, A., Yang, C.S.: A vector space model for automatic indexing. Communications of the ACM 18, 613–620 (1995)

    Article  Google Scholar 

  2. Yang, Y., Pierce, T., Carbonell, J.: A study on Retro-spective and On-Line Event detection. In: Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, USA, pp. 28–36 (1998)

    Google Scholar 

  3. Trieschnigg, D., Kraaij, W.: TNO hierarchical topic detection report at TDT 2004. In: The 7th Topic Detection and Tracking Conf. (2004)

    Google Scholar 

  4. Papka, R., Allan, J.: On Line New Event Detection using Single Pass Clustering. UMass Computer Science (1998)

    Google Scholar 

  5. Cataldi, L., Caro, D., Schifanella, C.: Emerging Topic Detection on Twitter based on Temporal and Social Terms Evaluation. In: MDMKDD 2010 Proceedings of the Tenth International Workshop on Multimedia Data Mining, Washington, pp. 1–10 (2010)

    Google Scholar 

  6. Phuvipadawat, S., Murata, T.: Breaking News Detection and Tracking in Twitter. In: 2010 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology (WI-IAT), Toronto, pp. 120–123 (2010)

    Google Scholar 

  7. Blei, D., Ng, A., Jordan, M., et al.: Latent dirichlet allocation. Journal of Machine Learning Research 3, 993–1022 (2003)

    MATH  Google Scholar 

  8. Stuart, G., Donald, G.: Stochastic relaxation gibbs distributions and the bayesian restoration of images. IEEE Transactions on Pattern Analysis and Machine Intelligence 6, 7212–7411 (1984)

    Google Scholar 

  9. Griffiths, T.L., Steyvers, M.: Finding scientific topics. Proceedings of the National Academy of Science 101, 5228–5235 (2004)

    Article  Google Scholar 

  10. The Linguistic Data Consortium.: The 2004 Topic Detection and Tracking. Task Definition and Evaluation Plan (2004), http://www.itl.nist.gov/iad/mig/tests/tdt/2004/TDT04.Eval.Plan.v1.2.compare.1.1c

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Huang, B., Yang, Y., Mahmood, A., Wang, H. (2012). Microblog Topic Detection Based on LDA Model and Single-Pass Clustering. In: Yao, J., et al. Rough Sets and Current Trends in Computing. RSCTC 2012. Lecture Notes in Computer Science(), vol 7413. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-32115-3_19

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-32115-3_19

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-32114-6

  • Online ISBN: 978-3-642-32115-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics