Abstract
Sina micro-blog (Weibo) is the first micro-blogging service in China and is growing fast in recent two years. This paper first studies the characteristics of Sina online social network and then focuses on the problem of indentifying influential users and automatic micro-blog classification. In a dataset prepared for this study, we find an approximate power-law follower distribution and a non-power-law friend distribution, a log correlation between follower number and tweet number, etc. In order to find the most popular users, we propose our algorithm called XinRank and compare it with the other two algorithms. The result shows that XinRank is different and it offers a new perspective for people to find influential users. In addition, our algorithm is dynamic and stability, which is special and better than the other two algorithms. We attempt to automatically classify a single Chinese micro-blog post into a set of high-level categories using a naive Bayes classifier. Our research indicates that even though an average micro-blogging post in Chinese is only 28 words in length, they can be categorized into one of eight categories with an average performance up to 84.2 %, using our proposed process. We try to address the automatic user interest discovery problem at the end of this paper. And finally, we combine XinRank and our micro-blog classifier to propose an interest-based influence ranking model.
Similar content being viewed by others
References
Benevenut F (2009) Characterizing user behavior in online social networks. In: Proceedings of ACM SIGCOMM internet measurement conference. ACM, New York, pp 49–62
Cheng A, Evans M (2009) Inside Twitter: an in-depth look inside the Twitter world. http://www.sysyomos.com/insidetwitter/
Durant K, Smith M (2006) Mining sentiment classification from political web logs. In: Proceeding of the workshop on web mining and web usage analysis of the 12th ACM SIGKDD international conference on knowledge discovery and data mining. ACM, Philadelphia
Fagin R (2003) Comparing top k lists. In Proc. of the 14th annual ACM-SIAM symposium on Discrete algorithms. Society for Industrial and Applied Mathematics. ACM, New York, pp 28–36
Gilad M, Rijke M (2006) Language model mixtures for contextual ad placement in personal blogs. In: Proceedings of the 5th international conference on natural language processing. Turku, Finland
Google Directory (2011) http://directory.google.com/. Last visited: 25 October 2011
Java A, Song X, Finin T, Tseng B (2007) Why we twitter: understanding microblogging usage and communities. In: WebKDD/SNA-KDD’07: Proceedings of the 9th WebKDD and 1st SNA-KDD 2007 workshop on Web mining and social network analysis. ACM, New York, pp 56–65
Joachims T (2011) A statistical learning model of text classification for support vector machines. In: Proceedings of the 24th annual international ACM SIGIR conference on research and development in information retrieval. pp 128–136
Kalita JK (2002) Naive Bayes Classifiers for Spam Detection. MXLogic, Inc. Colorado Springs, CO., Colorado
Krishnamurthy B (2009) A measure of Online Social Networks. In: Proceedings of COMSNETS’09
Kwak H, Lee C, Prak H, Moon S (2010) What is Twitter, a Social Network or a News Media? In: the International World Wide Web Conference Committee (IW3C2). ACM, New York, pp 591–600
Leskovec J, Adamic LA, Huberman BA (2006) The dynamics of viral marketing. In: Proceedings of the 7th ACM conference on Electronic commerce. ACM, New York, pp 228–237
Manning CD, Raghavan P, Schutze H (2008) Introduction to information retrieval, 1st edn Cambridge University Press, Cambridge
McCown F, Nelson ML (2007) Agreeing to disagree: search engines and their public interfaces. In: Proceedings of the 7th ACM/IEEE-CS joint conference on Digital libraries. ACM, New York, pp 309–318
Paecage L, Brin S, Motwani R (1999) The pagerank citation ranking: bringing order to the web. Technical Report 1999-66, Stanford InfoLab
Sharifi B (2010) Automatic microblog classification and summaraization. Master’s thesis, University of Colorado at Colorado Springs, Colorado
TunkRank (2011) http://thenoisychannel.com/2009/01/13/a-twitter-analog-to-pagerank/. Last visited 25 March 2011
Weng J, Lim E-P, Jiang J, He Q (2010) Twitterrank: finding topic-sensitive influential twitterers. In: Proc. of the third ACM international conference on Web search and data mining, New York, NY, USA, ACM, pp 261–270
Wu X, Wang J (2011) How about micro-blogging service in China: analysis and mining on sina micro-blog. In: Proceedings of 1st international symposium on From digital footprints to social and community intelligence. ACM, New York, pp 37–42
Acknowledgments
The work in this paper is in part supported by the National Natural Science Foundation of China under Grant No. 61073132; the Natural Science Foundation of Guangdong Province of China under Grant No. 915102750100-0035; Guangdong Province scientific and technological project under Grant No. 2009B010800017 and the Fundamental Research Funds for the Central Universities (101gpy33).
Author information
Authors and Affiliations
Corresponding authors
Additional information
This paper is an extended version of a paper appearing in UbiComp’11: Proceedings of 1st International Symposium on From Digital Footprints to Social and Community Intelligence. ACM, New York, pp. 37–42 (Wu et al. 2011).
Rights and permissions
About this article
Cite this article
Wu, X., Wang, J. Micro-blog in China: identify influential users and automatically classify posts on Sina micro-blog. J Ambient Intell Human Comput 5, 51–63 (2014). https://doi.org/10.1007/s12652-012-0121-3
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12652-012-0121-3