Asia Information Retrieval Symposium

Information Retrieval Technology pp 135-146 | Cite as

Improving Tweet Timeline Generation by Predicting Optimal Retrieval Depth

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9460)

Abstract

Tweet Timeline Generation (TTG) systems provide users with informative and concise summaries of topics, as they developed over time, in a retrospective manner. In order to produce a tweet timeline that constitutes a summary of a given topic, a TTG system typically retrieves a list of potentially-relevant tweets over which the timeline is eventually generated. In such design, dependency of the performance of the timeline generation step on that of the retrieval step is inevitable.

In this work, we aim at improving the performance of a given timeline generation system by controlling the depth of the ranked list of retrieved tweets considered in generating the timeline. We propose a supervised approach in which we predict the optimal depth of the ranked tweet list for a given topic by combining estimates of list quality computed at different depths.

We conducted our experiments on a recent TREC TTG test collection of 243 M tweets and 55 topics. We experimented with 14 different retrieval models (used to retrieve the initial ranked list of tweets) and 3 different TTG models (used to generate the final timeline). Our results demonstrate the effectiveness of the proposed approach; it managed to improve TTG performance over a strong baseline in 76 % of the cases, out of which 31 % were statistically significant, with no single significant degradation observed.

Keywords

Tweet summarization Microblogs Dynamic retrieval cutoff Query difficulty Query performance prediction Regression 

References

  1. 1.
    Alonso, O., Shiells, K.: Timelines as summaries of popular scheduled events. In: Proceedings of the 22nd International Conference on World Wide Web Companion, pp. 1037–1044. WWW 2013 Companion (2013)Google Scholar
  2. 2.
    Arampatzis, A., Kamps, J., Robertson, S.: Where to stop reading a ranked list?: threshold optimization using truncated score distributions. In: Proceedings of the 32nd International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 524–531. SIGIR 2009 (2009)Google Scholar
  3. 3.
    Carmel, D., Yom-Tov, E.: Estimating the Query Difficulty for Information Retrieval. Synthesis Lectures on Information Concepts, Retrieval, and Services 2(1), 1–89 (2010)CrossRefGoogle Scholar
  4. 4.
    Chen, X., Tang, B., Chen, G.: BUPT_pris at TREC 2014 microblog track. In: TREC 2014 (2014)Google Scholar
  5. 5.
    Chen, Y., Zhang, X., Li, Z., Ng, J.P.: Search engine reinforced semi-supervised classification and graph-based summarization of microblogs. Neurocomputing 152, 274–286 (2015)CrossRefGoogle Scholar
  6. 6.
    Cronen-Townsend, S., Zhou, Y., Croft, W.B.: Predicting query performance. In: Proceedings of the 25th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2002, pp. 299–306 (2002)Google Scholar
  7. 7.
    Cummins, R.: Predicting query performance directly from score distributions. In: Salem, M.V.M., Shaalan, K., Oroumchian, F., Shakery, A., Khelalfa, H. (eds.) AIRS 2011. LNCS, vol. 7097, pp. 315–326. Springer, Heidelberg (2011)CrossRefGoogle Scholar
  8. 8.
    Erkan, G., Radev, D.R.: Lexrank: graph-based lexical centrality as salience in text summarization. J. Artif. Intell. Res. 22(1), 457–479 (2004)Google Scholar
  9. 9.
    Hasanain, M., Elsayed, T.: QU at TREC-2014: online clustering with temporal and topical expansion for tweet timeline generation. In: TREC 2014 (2014)Google Scholar
  10. 10.
    Hasanain, M., Malhas, R., Elsayed, T.: Query performance prediction for microblog search: a preliminary study. In: Proceedings of the First International Workshop on Social Media Retrieval and Analysis, SoMeRA 2014, pp. 1–6 (2014)Google Scholar
  11. 11.
    Keikha, M., Gerani, S., Crestani, F.: Time-based relevance models. In: Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2011, pp. 1087–1088 (2011)Google Scholar
  12. 12.
    Lan, Y., Niu, S., Guo, J., Cheng, X.: Is top-k sufficient for ranking? In: Proceedings of the 22nd ACM International Conference on Conference on Information and Knowledge Management, CIKM 2013, pp. 1261–1270 (2013)Google Scholar
  13. 13.
    Li, X., Croft, W.B.: Time-based language models. In: Proceedings of the Twelfth International Conference on Information and Knowledge Management, CIKM 2003, pp. 469–475 (2003)Google Scholar
  14. 14.
    Lin, J., Efron, M., Wang, Y., Garrick, S.: Overview of the TREC-2014 microblog track (notebook draft). In: TREC 2014 (2014)Google Scholar
  15. 15.
    Louis, A., Nenkova, A.: Performance confidence estimation for automatic summarization. In: Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics, EACL 2009, pp. 541–548 (2009)Google Scholar
  16. 16.
    Lv, C., Fan, F., Qiang, R., Fei, Y., Yang, J.: PKUICST at TREC 2014 microblog track: feature extraction for effective microblog search and adaptative clustering algorithms for TTG. In: TREC 2014 (2014)Google Scholar
  17. 17.
    Magdy, W., Gao, W., Elganainy, T., Zhongyu, W.: QCRI at TREC 2014:applying the KISS principle for the TTG task in the microblog track. In: TREC 2014 (2014)Google Scholar
  18. 18.
    Rodriguez Perez, J.A., Jose, J.M.: Predicting query performance in microblog retrieval. In: Proceedings of the 37th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2014, pp. 1183–1186 (2014)Google Scholar
  19. 19.
    Shou, L., Wang, Z., Chen, K., Chen, G.: Sumblr: Continuous summarization of evolving tweet streams. In: Proceedings of the 36th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2013, pp. 533–542 (2013)Google Scholar
  20. 20.
    Shtok, A., Kurland, O., Carmel, D., Raiber, F., Markovits, G.: Predicting query performance by query-drift estimation. ACM Trans. Inf. Syst. 30(2), 11:1–11:35 (2012)CrossRefGoogle Scholar
  21. 21.
    Wang, Y., Witten, I.H.: Induction of model trees for predicting continuous classes. In: Proceedings of the 9th European Conference on Machine Learning Poster Papers, ECML 1997 (1997)Google Scholar
  22. 22.
    Xu, T., McNamee, P., Oard, D.W.: HLTCOE at TREC 2014: microblog and clinical decision support. In: TREC 2014 (2014)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  1. 1.Computer Science and Engineering Department, College of EngineeringQatar UniversityDohaQatar
  2. 2.Qatar Computing Research Institute, HBKUDohaQatar

Personalised recommendations