Advertisement

Understanding the Top Grass Roots in Sina-Weibo

  • Ze Huang
  • Bo Yuan
  • Xuelei Hu
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7751)

Abstract

Microblogging is now popular among everyday web users in China who have a common name called grass roots in Sina-Weibo, a major microblogging service similar to Twitter. In this paper, we investigate the properties of messages published by this group of users and classify the messages into various topic categories using text classification methods based on the Bag of Words (BOW) model. We find that, using Naïve Bayes, it is possible to achieve high accuracy in recognizing the topic of a message but the popularity of a message cannot be reliably predicated based on its contents. These findings are also further explored with visualization techniques.

Keywords

Microblogging Text Classification Bag of Words Visualization 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    The Nielsen Company: State of the Media: The Social Media Report – Q3 (2011), http://www.nielsen.com/us/en/insights/reports-downloads.html
  2. 2.
    Bermingham, A., Smeaton, A.: Classifying Sentiment in Microblogs: Is Brevity an Advantage? In: 19th ACM Conference on Information and Knowledge Management, pp. 1833–1836 (2010)Google Scholar
  3. 3.
    Sankaranarayanan, J., Samet, H., Teitler, B., Lieberman, M., Sperling, J.: TwitterStand: News in Tweets. In: 17th ACM SIGSPATIAL International Symposium on Advances in Geographic Information Systems, pp. 42–51 (2009)Google Scholar
  4. 4.
    Laboreiro, G., Sarmento, L., Teixeira, J., Oliveira, E.: Tokenizing Micro-Blogging Messages Using a Text Classification Approach. In: Fourth Workshop on Analytics for Noisy Unstructured Text Data, pp. 81–88 (2010)Google Scholar
  5. 5.
    Sriram, B., Fuhry, D., Demir, E., Ferhatosmanoglu, H., Demirbas, M.: Short Text Classification in Twitter to Improve Information Filtering. In: 33rd ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 841–842 (2010)Google Scholar
  6. 6.
    Ramage, D., Dumais, S., Liebling, D.: Characterizing Microblogs with Topic Models. In: Fourth International Conference on Weblogs and Social Media, pp. 130–137 (2010)Google Scholar
  7. 7.
    Rosa, K., Ellen, J.: Text Classification Methodologies Applied to Micro-Text in Military Chat. In: 2009 International Conference on Machine Learning and Applications, pp. 710–714 (2009)Google Scholar
  8. 8.
    Wu, S., Hofman, J., Mason, W., Watts, D.: Who Says What to Whom on Twitter. In: 20th International Conference on World Wide Web, pp. 705–714 (2011)Google Scholar
  9. 9.
    Naaman, M., Boase, J., Lai, C.: Is It Really About Me? Message Content in Social Awareness Streams. In: 2010 ACM Conference on Computer Supported Cooperative Work, pp. 189–192 (2010)Google Scholar
  10. 10.
    Java, A., Song, X., Finin, T., Tseng, B.: Why We Twitter: Understanding Microblogging Usage and Communities. In: 9th WebKDD and 1st SNA-KDD Workshop on Web Mining and Social Network Analysis, pp. 56–65 (2007)Google Scholar
  11. 11.
    Krishnamurthy, B., Gill, P., Arlitt, M.: A Few Chirps about Twitter. In: First Workshop on Online Social Networks, pp. 19–24 (2008)Google Scholar
  12. 12.
    Kwak, H., Lee, C., Park, H., Moon, S.: What is Twitter, a Social Network or a News Media? In: 19th International Conference on World Wide Web, pp. 591–600 (2010)Google Scholar
  13. 13.
    Qu, Y., Huang, C., Zhang, P., Zhang, J.: Microblogging after a Major Disaster in China: A Case Study of the 2010 Yushu Earthquake. In: 2011 ACM Conference on Computer Supported Cooperative Work, pp. 25–34 (2011)Google Scholar
  14. 14.
    Huang, W., Xu, L., Duan, J., Lu, Y.: Chinese Web-Page Classification Study. In: IEEE International Conference on Control and Automation, pp. 1553–1558 (2007)Google Scholar
  15. 15.
    Andre, P., Bernstein, M., Luther, K.: Who Gives a Tweet? Evaluating Microblog Content Value. In: 2012 ACM Conference on Computer Supported Cooperative Work, pp. 471–474 (2012)Google Scholar
  16. 16.
    Schonhofen, P.: Identifying Document Topics Using the Wikipedia Category Network. In: 2006 International Conference on Web Intelligence, pp. 456–462 (2006)Google Scholar
  17. 17.
    Broder, A., Fontoura, M., Gabrilovich, E., Joshi, A., Josifovski, V., Zhang, T.: Robust Classification of Rare Queries Using Web Knowledge. In: 30th ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 231–238 (2007)Google Scholar
  18. 18.
    Stray, J.: A Full-text Visualization of the Iraq War Logs, http://jonathanstray.com/a-full-text-visualization-of-the-iraq-war-logs

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Ze Huang
    • 1
  • Bo Yuan
    • 1
  • Xuelei Hu
    • 2
  1. 1.Intelligent Computing Lab, Division of Informatics, Graduate School at ShenzhenTsinghua UniversityShenzhenP.R. China
  2. 2.School of Computer Science and TechnologyNanjing University of Science and TechnologyNanjingP.R. China

Personalised recommendations