Retrieving Information from Microblog Using Pattern Mining and Relevance Feedback

  • Cher Han Lau
  • Xiaohui Tao
  • Dian Tjondronegoro
  • Yuefeng Li
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7696)

Abstract

Retrieving information from Twitter is always challenging due to its large volume, inconsistent writing and noise. Most existing information retrieval (IR) and text mining methods focus on term-based approach, but suffers from the problems of terms variation such as polysemy and synonymy. This problem deteriorates when such methods are applied on Twitter due to the length limit. Over the years, people have held the hypothesis that pattern-based methods should perform better than term-based methods as it provides more context, but limited studies have been conducted to support such hypothesis especially in Twitter. This paper presents an innovative framework to address the issue of performing IR in microblog. The proposed framework discover patterns in tweets as higher level feature to assign weight for low-level features (i.e. terms) based on their distributions in higher level features. We present the experiment results based on TREC11 microblog dataset and shows that our proposed approach significantly outperforms term-based methods Okapi BM25, TF-IDF and pattern based methods, using precision, recall and F measures.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Efron, M., Winget, M.: Questions are content: A taxonomy of questions in a microblogging environment. Proceedings of the American Society for Information Science and Technology 47(1), 1–10 (2010)CrossRefGoogle Scholar
  2. 2.
    Kwak, H., Lee, C., Park, H., Moon, S.: What is twitter, a social network or a news media? In: Proceedings of the 19th International Conference on World Wide Web, pp. 591–600. ACM (2010)Google Scholar
  3. 3.
    Boyd, D., Golder, S., Lotan, G.: Tweet, tweet, retweet: Conversational aspects of retweeting on twitter. In: Proceedings of the 2010 43rd Hawaii International Conference on System Sciences, HICSS 2010, pp. 1–10. IEEE Computer Society Press, Washington, DC (2010)CrossRefGoogle Scholar
  4. 4.
    Asur, S., Huberman, B.A.: Predicting the future with social media. In: 2010 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology, pp. 492–499. IEEE (2010)Google Scholar
  5. 5.
    Shamma, D.A., Kennedy, L., Churchill, E.F.: Tweet the debates: understanding community annotation of uncollected sources. In: WSM 2009: Proceedings of the First SIGMM Workshop on Social Media, pp. 3–10. ACM, New York (2009)Google Scholar
  6. 6.
    Sakaki, T., Okazaki, M., Matsuo, Y.: Earthquake shakes twitter users: real-time event detection by social sensors. In: WWW 2010: Proceedings of the 19th International Conference on World Wide Web, pp. 851–860. ACM, New York (2010)Google Scholar
  7. 7.
    Gaffney, D.: #iranelection: Quantifying online activism. In: Web Science Conference, Raleigh, NC, USA (April 2010)Google Scholar
  8. 8.
    Oh, O., Agrawal, M., Rao, H.: Information control and terrorism: Tracking the mumbai terrorist attack through twitter. In: Information Systems Frontiers, vol. 13, pp. 33–43 (2011), 10.1007/s10796-010-9275-8Google Scholar
  9. 9.
    Teevan, J., Ramage, D., Morris, M.R.: #twittersearch: a comparison of microblog search and web search. In: Proceedings of the Fourth ACM International Conference on Web Search and Data Mining, pp. 35–44. ACM (2011)Google Scholar
  10. 10.
    Efron, M.: Hashtag retrieval in a microblogging environment. In: International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2010, pp. 787–788. ACM, New York (2010)Google Scholar
  11. 11.
    Naveed, N., Gottron, T., Kunegis, J., Alhadi, A.C.: Searching microblogs: coping with sparsity and document quality. In: Proceedings of the 20th ACM International Conference on Information and Knowledge Management, CIKM 2011, pp. 183–188. ACM, New York (2011)Google Scholar
  12. 12.
    Wu, S.-T., Li, Y., Xu, Y.: Deploying approaches for pattern refinement in text mining. In: Proceedings of the Sixth International Conference on Data Mining, ICDM 2006, pp. 1157–1161. IEEE Computer Society, Washington, DC (2006)Google Scholar
  13. 13.
    Bernstein, M., Hong, L., Kairam, S., Chi, H., Suh, B.: A torrent of tweets: Managing information overload in online social streams. In: Workshop on Microblogging: What and How Can We Learn From It? (CHI 2010) (2010)Google Scholar
  14. 14.
    Efron, M.: Information search and retrieval in microblogs. Journal of the American Society for Information Science and Technology 62(6), 996–1008 (2011)MathSciNetCrossRefGoogle Scholar
  15. 15.
    Massoudi, K., Tsagkias, M., de Rijke, M., Weerkamp, W.: Incorporating query expansion and quality indicators in searching microblog posts. In: Advances in Information Retrieval, pp. 362–367 (2011)Google Scholar
  16. 16.
    Abel, F., Celik, I., Houben, G.-J., Siehndel, P.: Leveraging the Semantics of Tweets for Adaptive Faceted Search on Twitter. In: Aroyo, L., Welty, C., Alani, H., Taylor, J., Bernstein, A., Kagal, L., Noy, N., Blomqvist, E. (eds.) ISWC 2011, Part I. LNCS, vol. 7031, pp. 1–17. Springer, Heidelberg (2011)CrossRefGoogle Scholar
  17. 17.
    Nagmoti, R., Teredesai, A., De Cock, M.: Ranking approaches for microblog search. In: 2010 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology (WI-IAT), pp. 153–157. IEEE (2010)Google Scholar
  18. 18.
    Cui, H., Wen, J.-R., Nie, J.-Y., Ma, W.-Y.: Probabilistic query expansion using query logs. In: Proceedings of the 11th International Conference on World Wide Web, WWW 2002, pp. 325–332. ACM, New York (2002)Google Scholar
  19. 19.
    Manning, C.D., Raghavan, P., Schütze, H.: Introduction to Information Retrieval, 1st edn. Cambridge University Press (July 2008)Google Scholar
  20. 20.
    Soboroff, I., Robertson, S.: Building a filtering test collection for trec 2002. In: 26th Annual International ACM SIGIR Conference on Research and Development in Informaion Retrieval, SIGIR 2003, pp. 243–250. ACM, New York (2003)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Cher Han Lau
    • 1
  • Xiaohui Tao
    • 2
  • Dian Tjondronegoro
    • 1
  • Yuefeng Li
    • 1
  1. 1.Queensland University of TechnologyBrisbaneAustralia
  2. 2.University of Southern QueenslandDarling HeightsAustralia

Personalised recommendations