Searching for Quality Microblog Posts: Filtering and Ranking Based on Content Analysis and Implicit Links

  • Jan Vosecky
  • Kenneth Wai-Ting Leung
  • Wilfred Ng
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7238)

Abstract

Today, social networking has become a popular web activity, with a large amount of information created by millions of people every day. However, the study on effective searching of such social information is still in its infancy. In this paper, we focus on Twitter, a rapidly growing microblogging platform, which provides a large amount, diversity and varying quality of content. In order to provide higher quality content (e.g. posts mentioning news, events, useful facts or well-formed opinions) when a user searches for tweets on Twitter, we propose a new method to filter and rank tweets according to their quality. In order to model the quality of tweets, we devise a new set of link-based features, in addition to content-based features. We examine the implicit links between tweets, URLs, hashtags and users, and then propose novel metrics to reflect the popularity as well as quality-based reputation of websites, hashtags and users. We then evaluate both the content-based and link-based features in terms of classification effectiveness and identify an optimal feature subset that achieves the best classification accuracy. A detailed evaluation of our filtering and ranking models shows that the optimal feature subset outperforms traditional bag-of-words representation, while requiring significantly less computational time and storage. Moreover, we demonstrate that the proposed metrics based on implicit links are effective for determining tweets’ quality.

Keywords

Support Vector Machine Information Gain Feature Subset Mean Average Precision Ranking Model 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    TwitterEngineering: 200 million tweets per day, http://blog.twitter.com/2011/06/200-million-tweets-per-day.html
  2. 2.
    Alonso, O., Carson, C., Gerster, D., Ji, X., Nabar, S.: Detecting Uninteresting Content in Text Streams. In: Proc. of SIGIR CSE Workshop (2010)Google Scholar
  3. 3.
    Java, A., Song, X., Finin, T., Tseng, B.: Why We Twitter: Understanding Microblogging Usage and Communities. In: Zhang, H., Spiliopoulou, M., Mobasher, B., Giles, C.L., McCallum, A., Nasraoui, O., Srivastava, J., Yen, J. (eds.) WebKDD 2007. LNCS, vol. 5439, pp. 118–138. Springer, Heidelberg (2009)CrossRefGoogle Scholar
  4. 4.
    Zhao, D., Rosson, M.B.: How and why people twitter: the role that micro-blogging plays in informal communication at work. In: Proc. of GROUP (2009)Google Scholar
  5. 5.
    Sriram, B., Fuhry, D., Demir, E., Ferhatosmanoglu, H., Demirbas, M.: Short text classification in twitter to improve information filtering. In: Proc. of SIGIR Conference (2010)Google Scholar
  6. 6.
    Ramage, D., Dumais, S.T., Liebling, D.J.: Characterizing microblogs with topic models. In: Proc. of ICWSM Conference (2010)Google Scholar
  7. 7.
    Lauw, H., Ntoulas, A., Kenthapadi, K.: Estimating the quality of postings in the real-time web. In: Proc. of SSM Conference (2010)Google Scholar
  8. 8.
    Nagmoti, R., Teredesai, A., De Cock, M.: Ranking approaches for microblog search. In: Proc. of WI-IAT Conference (2010)Google Scholar
  9. 9.
    Trifan, M., Ionescu, D.: A new search method for ranking short text messages using semantic features and cluster coherence. In: Proc. of ICCC-CONTI (2010)Google Scholar
  10. 10.
    Sarma, A.D., Sarma, A. D., Gollapudi, S., Panigrahy, R.: Ranking mechanisms in twitter-like forums. In: Proc. of WSDM Conference (2010)Google Scholar
  11. 11.
    Duan, Y., Jiang, L., Qin, T., Zhou, M., Shum, H.-Y.: An empirical study on learning to rank of tweets. In: Proc. of COLING Conference (2010)Google Scholar
  12. 12.
    Grove, J.V.: Twitter attempts to personalize 1.6 billion search queries per day, http://mashable.com/2011/06/01/twitter-search-queries/
  13. 13.
    Toutanova, K., Klein, D., Manning, C., Singer, Y.: Feature-rich part-of-speech tagging with a cyclic dependency network. In: Proc. of HLT-NAACL (2003)Google Scholar
  14. 14.
    Lahiri, S., Mitra, P., Lu, X.: Informality Judgment at Sentence Level and Experiments with Formality Score. In: Gelbukh, A. (ed.) CICLing 2011, Part II. LNCS, vol. 6609, pp. 446–457. Springer, Heidelberg (2011)CrossRefGoogle Scholar
  15. 15.
    Finkel, J.R., Grenager, T., Manning, C.: Incorporating non-local information into information extraction systems by gibbs sampling. In: Proc. of ACL (2005)Google Scholar
  16. 16.
    Joachims, T.: Optimizing search engines using clickthrough data. In: Proc. of ACM SIGKDD Conference (2002)Google Scholar
  17. 17.
    Cheng, Z., Caverlee, J., Lee, K.: You are where you tweet: a content-based approach to geo-locating twitter users. In: Proc. of CIKM Conference (2010)Google Scholar
  18. 18.
    Kohavi, R., John, G.H.: Wrappers for feature subset selection. Artificial Intelligence 97, 273–324 (1997)MATHCrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Jan Vosecky
    • 1
  • Kenneth Wai-Ting Leung
    • 1
  • Wilfred Ng
    • 1
  1. 1.Department of Computer Science and EngineeringHong Kong University of Science and TechnologyHong Kong, China

Personalised recommendations