Abstract
Social media are one of the main contributors of user generated content; providing vast amounts of data in daily basis, covering a wide range of topics, interests and events. In order to identify and link meaningful and relevant information, clustering algorithms have been used to partition the user generated content. We have identified though that these algorithms exhibit various shortcomings when they have to deal with social media textual information, which is dynamic and streaming in nature. Thus we explore the idea to estimate the algorithms’ parameters based on observations on the clusters’ properties’ (like the centroid, shape and density) evolution. By experimenting with the clusters’ properties, we propose a methodological framework that detects the evolution of the clusters’ centroid, shape and density and explores their role in parameters’ estimation.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Aggarwal, C., Subbian, K.: Event detection in social streams. In: Proc. of SDM (2012)
Becker, H., Naaman, M., Gravano, L.: Learning similarity metrics for event identification in social media. In: Proc. of the 3rd WSDM, pp. 291–300. ACM, NY (2010)
Becker, H., Naaman, M., Gravano, L.: Beyond trending topics: Real-world event identification on twitter. In: 5th ICWSM. AAAI (2011)
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)
Bontcheva, K., Rout, D.: Making sense of social media streams through semantics: a survey. Semantic Web Journal (2012)
Cao, F., Ester, M., Qian, W., Zhou, A.: Density-based clustering over an evolving data stream with noise. In: SIAM Conference on Data Mining, pp. 328–339 (2006)
Cataldi, M., Di Caro, L., Schifanella, C.: Emerging topic detection on twitter based on temporal and social terms evaluation. In: Proc. of the 10th MDMKDD, pp. 4:1–4:10. ACM, NY (2010)
Chen, L., Roy, A.: Event detection from flickr data through wavelet-based spatial analysis. In: Proc. of the 18th ACM CIKM, pp. 523–532. ACM, NY (2009)
Devroye, L.: Sample-based non-uniform random variate generation. In: Proc. of the 18th WSC, pp. 260–265. ACM, NY (1986)
Duan, Y., Jiang, L., Qin, T., Zhou, M., Shum, H.-Y.: An empirical study on learning to rank of tweets. In: Proceedings of the 23rd International Conference on Computational Linguistics, COLING 2010, pp. 295–303. Association for Computational Linguistics, Stroudsburg (2010)
Hartigan, J.A., Wong, M.A.: A K-means clustering algorithm. Applied Statistics 28, 100–108 (1979)
He, Q., Chang, K., Peng Lim, E., Zhang, J.: Bursty feature representation for clustering text streams
Hotelling, H.: The Generalization of Student’s Ratio, pp. 360–378 (August 1931)
Lee, C.-H., Wu, C.-H., Chien, T.-F.: BursT: A dynamic term weighting scheme for mining microblogging messages. In: Liu, D., Zhang, H., Polycarpou, M., Alippi, C., He, H. (eds.) ISNN 2011, Part III. LNCS, vol. 6677, pp. 548–557. Springer, Heidelberg (2011)
Manning, C.D., Raghavan, P., Schtze, H.: Introduction to Information Retrieval. Cambridge University Press, New York (2008)
Mathioudakis, M., Koudas, N.: Twittermonitor: trend detection over the twitter stream. In: Proc. of SIGMOD, pp. 1155–1158. ACM, NY (2010)
Naaman, M., Boase, J., Lai, C.-H.: Is it really about me?: message content in social awareness streams. In: Proc. of CSCW, NY, USA, pp. 189–192 (2010)
Petrović, S., Osborne, M., Lavrenko, V.: Streaming first story detection with application to twitter. In: HLT: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Stroudsburg, PA, USA, pp. 181–189 (2010)
Ramage, D., Dumais, S., Liebling, D.: Characterizing microblogs with topic models. In: ICWSM (2010)
Sakaki, T., Okazaki, M., Matsuo, Y.: Earthquake shakes twitter users: real-time event detection by social sensors. In: Proc. of the 19th WWW, pp. 851–860. ACM, NY (2010)
Sayyadi, H., Hurst, M., Maykov, A.: Event detection and tracking in social streams. In: Proc. of the ICWSM. AAAI (2009)
Weng, J., Yao, Y., Leonardi, E., Lee, F., Lee, B.-S.: Event detection in twitter. Development (98), 401–408 (2011)
Zhao, Q., Mitra, P., Chen, B.: Temporal and information flow based event detection from social text streams. In: Proc. of the 22nd AAAI, vol. 2, pp. 1501–1506. AAAI Press (2007)
Zhao, Y., Karypis, G.: Criterion functions for document clustering: Experiments and analysis. Technical report (2002)
Zimmermann, M., Ntoutsi, I., Siddiqui, Z.F., Spiliopoulou, M., Kriegel, H.-P.: Discovering global and local bursts in a stream of news. In: Proc. of the 27th SAC, pp. 807–812. ACM, NY (2012)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Kleisarchaki, S., Kotzinos, D., Tsamardinos, I., Christophides, V. (2013). A Methodological Framework for Statistical Analysis of Social Text Streams. In: Tanaka, Y., Spyratos, N., Yoshida, T., Meghini, C. (eds) Information Search, Integration and Personalization. ISIP 2012. Communications in Computer and Information Science, vol 146. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-40140-4_11
Download citation
DOI: https://doi.org/10.1007/978-3-642-40140-4_11
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-40139-8
Online ISBN: 978-3-642-40140-4
eBook Packages: Computer ScienceComputer Science (R0)