Information Retrieval

, Volume 17, Issue 1, pp 74–108 | Cite as

Using temporal bursts for query modeling

  • Maria-Hendrike Peetz
  • Edgar Meij
  • Maarten de Rijke
Article

Abstract

We present an approach to query modeling that leverages the temporal distribution of documents in an initially retrieved set of documents. In news-related document collections such distributions tend to exhibit bursts. Here, we define a burst to be a time period where unusually many documents are published. In our approach we detect bursts in result lists returned for a query. We then model the term distributions of the bursts using a reduced result list and select its most descriptive terms. Finally, we merge the sets of terms obtained in this manner so as to arrive at a reformulation of the original query. For query sets that consist of both temporal and non-temporal queries, our query modeling approach incorporates an effective selection method of terms. We consistently and significantly improve over various baselines, such as relevance models, on both news collections and a collection of blog posts.

Keywords

Information retrieval Temporal information retrieval Query modeling 

Notes

Acknowledgments

We are grateful to our reviewers for providing valuable feedback and suggestions. This research was partially supported by the European Union’s ICT Policy Support Programme as part of the Competitiveness and Innovation Framework Programme, CIP ICT-PSP under grant agreement nr 250430, the European Community’s Seventh Framework Programme (FP7/2007-2013) under grant agreements nr 258191 (PROMISE Network of Excellence) and 288024 (LiMoSINe project), the Netherlands Organisation for Scientific Research (NWO) under project nrs 612.061.814, 612.061.815, 640.004.802, 727.011.005, 612.001.116, HOR-11-10, the Center for Creation, Content and Technology (CCCT), the BILAND project funded by the CLARIN-nl program, the Dutch national program COMMIT, the ESF Research Network Program ELIAS, the Elite Network Shifts project funded by the Royal Dutch Academy of Sciences (KNAW), and the Netherlands eScience Center under project number 027.012.105.

References

  1. Alonso, O., Strötgen, J., Baeza-Yates, R., & Gertz, M. (2011). Temporal information retrieval: Challenges and opportunities. In Proceedings of the 1st international temporal web analytics workshop (TWAW 2011), pp. 1–8.Google Scholar
  2. Amodeo, G., Amati, G., & Gambosi, G. (2011). On relevance, time and query expansion. In CIKM ’11: Proceedings of the 20th ACM international conference on Information and knowledge management (pp. 1973–1976). New York, NY: ACM.Google Scholar
  3. Balog, K., Weerkamp, W. & de Rijke, M. (2008). A few examples go a long way: Constructing query models from elaborate query formulations. In SIGIR ’08: Proceedings of the 31st annual international ACM SIGIR conference on research and development in information retrieval, (pp. 371–378). New York, NY: ACM. ISBN 978-1-60558-164-4.Google Scholar
  4. Balog, K., Bron, M., & de Rijke, M. (2010). Category-based query modeling for entity search. In ECIR 2010: 32nd European conference on information retrieval, pp. 319–331.Google Scholar
  5. Berberich, K., Bedathur, S., Alonso, O., & Weikum, G. (2010). A language modeling approach for temporal information needs. In ECIR 2010: 32nd European conference on information retrieval, Berlin: Springer .Google Scholar
  6. Blei, D. M., Ng, A. Y., & Jordan, M. I. (2003). Latent dirichlet allocation. Journal of Machine Learning Research, 3(4-5), 993–1022.MATHGoogle Scholar
  7. Bron, M., Balog, K., & de Rijke, M. (2010). Ranking related entities: Components and analyses. In CIKM ’10: 19th ACM international conference on information and knowledge management, Toronto: ACM.Google Scholar
  8. Chien, S., & Immorlica, N. (2005). Semantic similarity between search engine queries using temporal correlation. In Proceedings of the 14th international conference on World Wide Web (WWW ’05), (pp. 2–11). New York, NY: ACM.Google Scholar
  9. Corso, G. M. D., Gullí, A., & Romani, F. (2005). Ranking a stream of news. In Proceedings of the 14th international conference on the World Wide Web (WWW ’05).Google Scholar
  10. Cover, T. M., & Hart, P. E. (1967). Nearest neighbour pattern classification. In Institute of electrical and electronics engineers transactions on information theory, 13, pp. 21–27Google Scholar
  11. Dakka, W., Gravano, L., & Ipeirotis, P. G. (2012). Answering general time-sensitive queries. IEEE Transactions on Knowledge and Data Engineering, 24(2), 220–235CrossRefGoogle Scholar
  12. Diaz, F. & Metzler, D. (2006). Improving the estimation of relevance models using large external corpora. In SIGIR ’06: 29th annual international ACM SIGIR conference on research & development on information retrieval, pp. 154–161.Google Scholar
  13. Dong, A., Zhang, R., Kolari, P., Bai, J., Diaz, F., Chang, Y., Zheng, Z., & Zha, H. (2010). Time is of the essence: improving recency ranking using twitter data. In Proceedings of the 19th international conference on World wide web (WWW ’10), (pp. 331–340). New York, NY: ACM.Google Scholar
  14. Efron, M. (2010). Linear time series models for term weighting in information retrieval. Journal of the American Society for Information Science and Technology, 6(7), 1299–1312.CrossRefGoogle Scholar
  15. Efron, M. & Golovchinsky, G. (2011) Estimation methods for ranking recent information. In SIGIR ’11: 34th annual international ACM SIGIR conference on research & development on information retrieval, pp. 495–504.Google Scholar
  16. Hamilton, J. D. (1994). Time-series analysis, 1 edn. Princeton, NJ: Princeton Univerity Press.MATHGoogle Scholar
  17. Hofmann, K. & Weerkamp, W. (2008). Content extraction for information retrieval in blogs and intranets. Technical report, University of Amsterdam .Google Scholar
  18. Jaleel, N. A., Allan, J., Croft, W. B., Diaz, F., Larkey, L. S., Li, X., Smucker, M. D., & Wade, C. (2004). UMass at TREC 2004: Novelty and hard. In TREC 2004.Google Scholar
  19. Java, A., Kolari, P., Finin, T., Joshi, A. & Martineau, J. (2006) The BlogVox opinion retrieval system. In TREC 2006.Google Scholar
  20. Jones, R. & Diaz, F. (2007). Temporal profiles of queries. ACM Transaction Informayion Systems, 25.Google Scholar
  21. Kamps, J. (2004). Improving retrieval effectiveness by reranking documents based on controlled vocabulary. In Advances in information retrieval: 26th European conference on IR research (ECIR 2004), (pp. 283–295). Heidelberg: Springer.Google Scholar
  22. Keikha, M., Gerani, S., & Crestani, F. (2011a) Time-based relevance models. In SIGIR ’11: Proceedings of the 34th international ACM SIGIR conference on research and development in Information, (pp. 1087–1088). New York, NY: ACM.Google Scholar
  23. Keikha, M., Gerani, S., & Crestani, F. (2011b). Temper: a temporal relevance feedback method. In ECIR 2011: 33rd European conference on information retrieval.Google Scholar
  24. Kleinberg, J. M. (2002). Bursty and hierarchical structure in streams. In KDD ’02: The eighth ACM SIGKDD international conference on knowledge discovery and data mining, ACM, pp. 91–101.Google Scholar
  25. Kulkarni, A., Teevan, J., Svore, K. M., & Dumais, S. T. (2011). Understanding temporal query dynamics. In WSDM 2011: The fourth ACM international conference on Web search and data mining, WSDM ’11. ACM, 2011.Google Scholar
  26. Lavrenko, V., & Croft, W. B. (2001). Relevance based language models. In SIGIR ’01: Proceedings of the 24th annual international ACM SIGIR conference on research and development in information retrieval, (pp. 120–127). New York, NY: ACM.Google Scholar
  27. Li, X., & Croft, W. B. (2003). Time-based language models. In CIKM ’03: International conference on information and knowledge management.Google Scholar
  28. Macdonald, C., & Ounis, I. (2006). The TREC blogs06 collection: Creating and analyzing a blog test collection. Technical report TR-2006-224, U. Glasgow.Google Scholar
  29. Manning, C., Raghavan, P., & Schütze, H. (2008). Introduction to information retrieval. Cambridge: Cambridge University Press.CrossRefMATHGoogle Scholar
  30. Martins, B., Manguinhas, H., & Borbinha, J. (2008). Extracting and exploring the geo-temporal semantics of textual resources. In Proceedings of the 2008 IEEE international conference on semantic computing, (pp. 1–9). Washington, DC: IEEE Computer Society.Google Scholar
  31. Massoudi, K., Tsagkias, E., de Rijke, M., & Weerkamp, W. (2011). Incorporating query expansion and quality indicators in searching microblog posts. In ECIR 2011: 33rd European conference on information retrieval.Google Scholar
  32. Meij, E., & de Rijke, M. (2010) Supervised query modeling using wikipedia. In SIGIR ’10: Proceedings of the 33rd annual international ACM SIGIR conference on research and development in information retrieval, ACM.Google Scholar
  33. Meij, E., Trieschnigg, D., de Rijke, M., & Kraaij, W. (2010). Conceptual language models for domain-specific retrieval. Information Processing and Management, 46(4), 448–469.CrossRefGoogle Scholar
  34. Odijk, D., de Rooij, O., Peetz, M.-H., Pieters, T., de Rijke, M., & Snelders, S. (2012). Semantic document selection. Historical research on collections that Span multiple centuries. In Research and advanced technology for digital libraries—international conference on theory and practice of digital libraries, TPDL 2012, Cypres.Google Scholar
  35. Ounis, I., de Rijke, M., Macdonald, C., Mishne, G., & Soboroff, I. (2006). Overview of the TREC-2006 blog track. In TREC 2006, Gaithersburg.Google Scholar
  36. Peetz, M.-H., & de Rijke, M. (2013). Cognitive temporal document priors. In 34th European conference on information retrieval (ECIR’13).Google Scholar
  37. Peetz, M.-H., Meij, E., de Rijke, M., & Weerkamp, W. (2012). Adaptive temporal query modeling. In ECIR 2012: 34th European conference on information retrieval.Google Scholar
  38. Ponte, J. M., & Croft, W. B. (1998). A language modeling approach to information retrieval. In SIGIR ’98: Proceedings of the 21st annual international ACM SIGIR conference on research and development in information retrieval, pp. 275–281.Google Scholar
  39. Pustejovsky, J., Castaño, J. M., Ingria, R., Sauri, R., Gaizauskas, R. J., Setzer, A., Katz, G., & Radev, D. R. (2003). Timeml: Robust specification of event and temporal expressions in text. In New directions in question answering, pp. 28–34.Google Scholar
  40. Qiu, Y., & Frei, H.-P. (1993). Concept based query expansion. In SIGIR ’93: Proceedings of the 16th annual international ACM-SIGIR conference on research and development in Iinformation retrieval, ACM, pp. 160–169.Google Scholar
  41. Rocchio, J. J. (1971). Relevance feedback in information retrieval. In G. Salton (Ed.), The SMART retrieval system—experiments in automatic document processing, (pp. 313–323). Prentice Hall, Englewood Cliffs, NJ.Google Scholar
  42. Seki, K., Kino, Y., Sato, S., & Uehara, K. (2007). TREC 2007 blog track experiments at Kobe University. In TREC 2007.Google Scholar
  43. Tsagkias, M., Weerkamp, W., & Rijke, M. (2010). News comments: Exploring, modeling, and online prediction. In C. Gurrin, Y. He, G. Kazai, U. Kruschwitz, S. Little, T. Roelleke, S. Rüger, & K. Rijsbergen (Eds.), Advances in information retrieval. Lecture notes in computer science (Vol. 5993, pp. 191–203). Berlin, Heidelberg: Springer.Google Scholar
  44. Vendler, Z. (1957). Verbs and times. The Philosophical Review, 66(2).Google Scholar
  45. Verhagen, M., & Pustejovsky, J. (2008). Temporal processing with the TARSQI toolkit. In 22nd international conference on on computational linguistics: Demonstration papers, COLING ’08, (pp. 189–192). Stroudsburg, PA: Association for Computational Linguistics.Google Scholar
  46. Wang, X., Zhai, C., Hu, X., & Sproat, R. (2007). Mining correlated bursty topic patterns from coordinated text streams. In KDD ’07: The 13th ACM SIGKDD international conference on knowledge discovery and data mining.Google Scholar
  47. Weerkamp, W., & de Rijke, M. (2008). Credibility improves topical blog post retrieval. In Proceedings of ACL-08: HLT, (pp. 923–931). Columbus, OH: ACL.Google Scholar
  48. Weerkamp, W., & de Rijke, M. (2012). Credibility-inspired ranking for blog post retrieval. Information Retrieval Journal, 15(3–4), 243–277.CrossRefGoogle Scholar
  49. Weerkamp, W., Balog, K., & de Rijke, M. (2009). A generative blog post retrieval model that uses query expansion based on external collections. In Joint conference of the 47th annual meeting of the association for computational linguistics and the 4th international joint conference on natural language processing of the Asian Federation of Natural Language Processing (ACL-ICNLP 2009), pp. 1057–1065.Google Scholar
  50. Weerkamp, W., Balog, K., & de Rijke, M. (2012). Exploiting external collections for query expansion. ACM Transactions on the Web, 6(4):Article 18.Google Scholar
  51. Zhai, C., & Lafferty, J. (2001). Model-based feedback in the language modeling approach to information retrieval. In CIKM 01: Tenth international conference on information and knowledge management, pp. 403–410.Google Scholar
  52. Zhai, C., & Lafferty, J. (2004). A study of smoothing methods for language models applied to information retrieval. ACM Transaction on Information Systems, 22(2), 179–214.CrossRefGoogle Scholar
  53. Zhang, W., & Yu, C. (2006). UIC at TREC 2006 blog track. In TREC 2006.Google Scholar

Copyright information

© Springer Science+Business Media New York 2013

Authors and Affiliations

  • Maria-Hendrike Peetz
    • 1
  • Edgar Meij
    • 1
  • Maarten de Rijke
    • 1
  1. 1.ISLAUniversity of AmsterdamAmsterdamThe Netherlands

Personalised recommendations