Abstract
Today, social networking has become a popular web activity, with a large amount of information created by millions of people every day. However, the study on effective searching of such social information is still in its infancy. In this paper, we focus on Twitter, a rapidly growing microblogging platform, which provides a large amount, diversity and varying quality of content. In order to provide higher quality content (e.g. posts mentioning news, events, useful facts or well-formed opinions) when a user searches for tweets on Twitter, we propose a new method to filter and rank tweets according to their quality. In order to model the quality of tweets, we devise a new set of link-based features, in addition to content-based features. We examine the implicit links between tweets, URLs, hashtags and users, and then propose novel metrics to reflect the popularity as well as quality-based reputation of websites, hashtags and users. We then evaluate both the content-based and link-based features in terms of classification effectiveness and identify an optimal feature subset that achieves the best classification accuracy. A detailed evaluation of our filtering and ranking models shows that the optimal feature subset outperforms traditional bag-of-words representation, while requiring significantly less computational time and storage. Moreover, we demonstrate that the proposed metrics based on implicit links are effective for determining tweets’ quality.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
TwitterEngineering: 200 million tweets per day, http://blog.twitter.com/2011/06/200-million-tweets-per-day.html
Alonso, O., Carson, C., Gerster, D., Ji, X., Nabar, S.: Detecting Uninteresting Content in Text Streams. In: Proc. of SIGIR CSE Workshop (2010)
Java, A., Song, X., Finin, T., Tseng, B.: Why We Twitter: Understanding Microblogging Usage and Communities. In: Zhang, H., Spiliopoulou, M., Mobasher, B., Giles, C.L., McCallum, A., Nasraoui, O., Srivastava, J., Yen, J. (eds.) WebKDD 2007. LNCS, vol. 5439, pp. 118–138. Springer, Heidelberg (2009)
Zhao, D., Rosson, M.B.: How and why people twitter: the role that micro-blogging plays in informal communication at work. In: Proc. of GROUP (2009)
Sriram, B., Fuhry, D., Demir, E., Ferhatosmanoglu, H., Demirbas, M.: Short text classification in twitter to improve information filtering. In: Proc. of SIGIR Conference (2010)
Ramage, D., Dumais, S.T., Liebling, D.J.: Characterizing microblogs with topic models. In: Proc. of ICWSM Conference (2010)
Lauw, H., Ntoulas, A., Kenthapadi, K.: Estimating the quality of postings in the real-time web. In: Proc. of SSM Conference (2010)
Nagmoti, R., Teredesai, A., De Cock, M.: Ranking approaches for microblog search. In: Proc. of WI-IAT Conference (2010)
Trifan, M., Ionescu, D.: A new search method for ranking short text messages using semantic features and cluster coherence. In: Proc. of ICCC-CONTI (2010)
Sarma, A.D., Sarma, A. D., Gollapudi, S., Panigrahy, R.: Ranking mechanisms in twitter-like forums. In: Proc. of WSDM Conference (2010)
Duan, Y., Jiang, L., Qin, T., Zhou, M., Shum, H.-Y.: An empirical study on learning to rank of tweets. In: Proc. of COLING Conference (2010)
Grove, J.V.: Twitter attempts to personalize 1.6 billion search queries per day, http://mashable.com/2011/06/01/twitter-search-queries/
Toutanova, K., Klein, D., Manning, C., Singer, Y.: Feature-rich part-of-speech tagging with a cyclic dependency network. In: Proc. of HLT-NAACL (2003)
Lahiri, S., Mitra, P., Lu, X.: Informality Judgment at Sentence Level and Experiments with Formality Score. In: Gelbukh, A. (ed.) CICLing 2011, Part II. LNCS, vol. 6609, pp. 446–457. Springer, Heidelberg (2011)
Finkel, J.R., Grenager, T., Manning, C.: Incorporating non-local information into information extraction systems by gibbs sampling. In: Proc. of ACL (2005)
Joachims, T.: Optimizing search engines using clickthrough data. In: Proc. of ACM SIGKDD Conference (2002)
Cheng, Z., Caverlee, J., Lee, K.: You are where you tweet: a content-based approach to geo-locating twitter users. In: Proc. of CIKM Conference (2010)
Kohavi, R., John, G.H.: Wrappers for feature subset selection. Artificial Intelligence 97, 273–324 (1997)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Vosecky, J., Leung, K.WT., Ng, W. (2012). Searching for Quality Microblog Posts: Filtering and Ranking Based on Content Analysis and Implicit Links. In: Lee, Sg., Peng, Z., Zhou, X., Moon, YS., Unland, R., Yoo, J. (eds) Database Systems for Advanced Applications. DASFAA 2012. Lecture Notes in Computer Science, vol 7238. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-29038-1_29
Download citation
DOI: https://doi.org/10.1007/978-3-642-29038-1_29
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-29037-4
Online ISBN: 978-3-642-29038-1
eBook Packages: Computer ScienceComputer Science (R0)