Skip to main content

A Methodological Framework for Statistical Analysis of Social Text Streams

  • Conference paper
Information Search, Integration and Personalization (ISIP 2012)

Abstract

Social media are one of the main contributors of user generated content; providing vast amounts of data in daily basis, covering a wide range of topics, interests and events. In order to identify and link meaningful and relevant information, clustering algorithms have been used to partition the user generated content. We have identified though that these algorithms exhibit various shortcomings when they have to deal with social media textual information, which is dynamic and streaming in nature. Thus we explore the idea to estimate the algorithms’ parameters based on observations on the clusters’ properties’ (like the centroid, shape and density) evolution. By experimenting with the clusters’ properties, we propose a methodological framework that detects the evolution of the clusters’ centroid, shape and density and explores their role in parameters’ estimation.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Aggarwal, C., Subbian, K.: Event detection in social streams. In: Proc. of SDM (2012)

    Google Scholar 

  2. Becker, H., Naaman, M., Gravano, L.: Learning similarity metrics for event identification in social media. In: Proc. of the 3rd WSDM, pp. 291–300. ACM, NY (2010)

    Chapter  Google Scholar 

  3. Becker, H., Naaman, M., Gravano, L.: Beyond trending topics: Real-world event identification on twitter. In: 5th ICWSM. AAAI (2011)

    Google Scholar 

  4. Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)

    MATH  Google Scholar 

  5. Bontcheva, K., Rout, D.: Making sense of social media streams through semantics: a survey. Semantic Web Journal (2012)

    Google Scholar 

  6. Cao, F., Ester, M., Qian, W., Zhou, A.: Density-based clustering over an evolving data stream with noise. In: SIAM Conference on Data Mining, pp. 328–339 (2006)

    Google Scholar 

  7. Cataldi, M., Di Caro, L., Schifanella, C.: Emerging topic detection on twitter based on temporal and social terms evaluation. In: Proc. of the 10th MDMKDD, pp. 4:1–4:10. ACM, NY (2010)

    Google Scholar 

  8. Chen, L., Roy, A.: Event detection from flickr data through wavelet-based spatial analysis. In: Proc. of the 18th ACM CIKM, pp. 523–532. ACM, NY (2009)

    Google Scholar 

  9. Devroye, L.: Sample-based non-uniform random variate generation. In: Proc. of the 18th WSC, pp. 260–265. ACM, NY (1986)

    Google Scholar 

  10. Duan, Y., Jiang, L., Qin, T., Zhou, M., Shum, H.-Y.: An empirical study on learning to rank of tweets. In: Proceedings of the 23rd International Conference on Computational Linguistics, COLING 2010, pp. 295–303. Association for Computational Linguistics, Stroudsburg (2010)

    Google Scholar 

  11. Hartigan, J.A., Wong, M.A.: A K-means clustering algorithm. Applied Statistics 28, 100–108 (1979)

    Article  MATH  Google Scholar 

  12. He, Q., Chang, K., Peng Lim, E., Zhang, J.: Bursty feature representation for clustering text streams

    Google Scholar 

  13. Hotelling, H.: The Generalization of Student’s Ratio, pp. 360–378 (August 1931)

    Google Scholar 

  14. Lee, C.-H., Wu, C.-H., Chien, T.-F.: BursT: A dynamic term weighting scheme for mining microblogging messages. In: Liu, D., Zhang, H., Polycarpou, M., Alippi, C., He, H. (eds.) ISNN 2011, Part III. LNCS, vol. 6677, pp. 548–557. Springer, Heidelberg (2011)

    Chapter  Google Scholar 

  15. Manning, C.D., Raghavan, P., Schtze, H.: Introduction to Information Retrieval. Cambridge University Press, New York (2008)

    Book  MATH  Google Scholar 

  16. Mathioudakis, M., Koudas, N.: Twittermonitor: trend detection over the twitter stream. In: Proc. of SIGMOD, pp. 1155–1158. ACM, NY (2010)

    Google Scholar 

  17. Naaman, M., Boase, J., Lai, C.-H.: Is it really about me?: message content in social awareness streams. In: Proc. of CSCW, NY, USA, pp. 189–192 (2010)

    Google Scholar 

  18. Petrović, S., Osborne, M., Lavrenko, V.: Streaming first story detection with application to twitter. In: HLT: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Stroudsburg, PA, USA, pp. 181–189 (2010)

    Google Scholar 

  19. Ramage, D., Dumais, S., Liebling, D.: Characterizing microblogs with topic models. In: ICWSM (2010)

    Google Scholar 

  20. Sakaki, T., Okazaki, M., Matsuo, Y.: Earthquake shakes twitter users: real-time event detection by social sensors. In: Proc. of the 19th WWW, pp. 851–860. ACM, NY (2010)

    Google Scholar 

  21. Sayyadi, H., Hurst, M., Maykov, A.: Event detection and tracking in social streams. In: Proc. of the ICWSM. AAAI (2009)

    Google Scholar 

  22. Weng, J., Yao, Y., Leonardi, E., Lee, F., Lee, B.-S.: Event detection in twitter. Development (98), 401–408 (2011)

    Google Scholar 

  23. Zhao, Q., Mitra, P., Chen, B.: Temporal and information flow based event detection from social text streams. In: Proc. of the 22nd AAAI, vol. 2, pp. 1501–1506. AAAI Press (2007)

    Google Scholar 

  24. Zhao, Y., Karypis, G.: Criterion functions for document clustering: Experiments and analysis. Technical report (2002)

    Google Scholar 

  25. Zimmermann, M., Ntoutsi, I., Siddiqui, Z.F., Spiliopoulou, M., Kriegel, H.-P.: Discovering global and local bursts in a stream of news. In: Proc. of the 27th SAC, pp. 807–812. ACM, NY (2012)

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Kleisarchaki, S., Kotzinos, D., Tsamardinos, I., Christophides, V. (2013). A Methodological Framework for Statistical Analysis of Social Text Streams. In: Tanaka, Y., Spyratos, N., Yoshida, T., Meghini, C. (eds) Information Search, Integration and Personalization. ISIP 2012. Communications in Computer and Information Science, vol 146. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-40140-4_11

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-40140-4_11

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-40139-8

  • Online ISBN: 978-3-642-40140-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics