Skip to main content
Log in

Micro-blog in China: identify influential users and automatically classify posts on Sina micro-blog

  • Original Research
  • Published:
Journal of Ambient Intelligence and Humanized Computing Aims and scope Submit manuscript

Abstract

Sina micro-blog (Weibo) is the first micro-blogging service in China and is growing fast in recent two years. This paper first studies the characteristics of Sina online social network and then focuses on the problem of indentifying influential users and automatic micro-blog classification. In a dataset prepared for this study, we find an approximate power-law follower distribution and a non-power-law friend distribution, a log correlation between follower number and tweet number, etc. In order to find the most popular users, we propose our algorithm called XinRank and compare it with the other two algorithms. The result shows that XinRank is different and it offers a new perspective for people to find influential users. In addition, our algorithm is dynamic and stability, which is special and better than the other two algorithms. We attempt to automatically classify a single Chinese micro-blog post into a set of high-level categories using a naive Bayes classifier. Our research indicates that even though an average micro-blogging post in Chinese is only 28 words in length, they can be categorized into one of eight categories with an average performance up to 84.2 %, using our proposed process. We try to address the automatic user interest discovery problem at the end of this paper. And finally, we combine XinRank and our micro-blog classifier to propose an interest-based influence ranking model.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17

Similar content being viewed by others

References

  • Benevenut F (2009) Characterizing user behavior in online social networks. In: Proceedings of ACM SIGCOMM internet measurement conference. ACM, New York, pp 49–62

  • Cheng A, Evans M (2009) Inside Twitter: an in-depth look inside the Twitter world. http://www.sysyomos.com/insidetwitter/

  • Durant K, Smith M (2006) Mining sentiment classification from political web logs. In: Proceeding of the workshop on web mining and web usage analysis of the 12th ACM SIGKDD international conference on knowledge discovery and data mining. ACM, Philadelphia

  • Fagin R (2003) Comparing top k lists. In Proc. of the 14th annual ACM-SIAM symposium on Discrete algorithms. Society for Industrial and Applied Mathematics. ACM, New York, pp 28–36

  • Gilad M, Rijke M (2006) Language model mixtures for contextual ad placement in personal blogs. In: Proceedings of the 5th international conference on natural language processing. Turku, Finland

  • Google Directory (2011) http://directory.google.com/. Last visited: 25 October 2011

  • Java A, Song X, Finin T, Tseng B (2007) Why we twitter: understanding microblogging usage and communities. In: WebKDD/SNA-KDD’07: Proceedings of the 9th WebKDD and 1st SNA-KDD 2007 workshop on Web mining and social network analysis. ACM, New York, pp 56–65

  • Joachims T (2011) A statistical learning model of text classification for support vector machines. In: Proceedings of the 24th annual international ACM SIGIR conference on research and development in information retrieval. pp 128–136

  • Kalita JK (2002) Naive Bayes Classifiers for Spam Detection. MXLogic, Inc. Colorado Springs, CO., Colorado

  • Krishnamurthy B (2009) A measure of Online Social Networks. In: Proceedings of COMSNETS’09

  • Kwak H, Lee C, Prak H, Moon S (2010) What is Twitter, a Social Network or a News Media? In: the International World Wide Web Conference Committee (IW3C2). ACM, New York, pp 591–600

  • Leskovec J, Adamic LA, Huberman BA (2006) The dynamics of viral marketing. In: Proceedings of the 7th ACM conference on Electronic commerce. ACM, New York, pp 228–237

  • Manning CD, Raghavan P, Schutze H (2008) Introduction to information retrieval, 1st edn Cambridge University Press, Cambridge

  • McCown F, Nelson ML (2007) Agreeing to disagree: search engines and their public interfaces. In: Proceedings of the 7th ACM/IEEE-CS joint conference on Digital libraries. ACM, New York, pp 309–318

  • Paecage L, Brin S, Motwani R (1999) The pagerank citation ranking: bringing order to the web. Technical Report 1999-66, Stanford InfoLab

  • Sharifi B (2010) Automatic microblog classification and summaraization. Master’s thesis, University of Colorado at Colorado Springs, Colorado

  • TunkRank (2011) http://thenoisychannel.com/2009/01/13/a-twitter-analog-to-pagerank/. Last visited 25 March 2011

  • Weng J, Lim E-P, Jiang J, He Q (2010) Twitterrank: finding topic-sensitive influential twitterers. In: Proc. of the third ACM international conference on Web search and data mining, New York, NY, USA, ACM, pp 261–270

  • Wu X, Wang J (2011) How about micro-blogging service in China: analysis and mining on sina micro-blog. In: Proceedings of 1st international symposium on From digital footprints to social and community intelligence. ACM, New York, pp 37–42

Download references

Acknowledgments

The work in this paper is in part supported by the National Natural Science Foundation of China under Grant No. 61073132; the Natural Science Foundation of Guangdong Province of China under Grant No. 915102750100-0035; Guangdong Province scientific and technological project under Grant No. 2009B010800017 and the Fundamental Research Funds for the Central Universities (101gpy33).

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Xinmiao Wu or Jianmin Wang.

Additional information

This paper is an extended version of a paper appearing in UbiComp’11: Proceedings of 1st International Symposium on From Digital Footprints to Social and Community Intelligence. ACM, New York, pp. 37–42 (Wu et al. 2011).

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wu, X., Wang, J. Micro-blog in China: identify influential users and automatically classify posts on Sina micro-blog. J Ambient Intell Human Comput 5, 51–63 (2014). https://doi.org/10.1007/s12652-012-0121-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12652-012-0121-3

Keywords

Navigation