Advertisement

Information Retrieval Journal

, Volume 20, Issue 4, pp 363–398 | Cite as

Identifying top relevant dates for implicit time sensitive queries

  • Ricardo Campos
  • Gaël Dias
  • Alípio Mário Jorge
  • Célia Nunes
Article

Abstract

Despite a clear improvement of search and retrieval temporal applications, current search engines are still mostly unaware of the temporal dimension. Indeed, in most cases, systems are limited to offering the user the chance to restrict the search to a particular time period or to simply rely on an explicitly specified time span. If the user is not explicit in his/her search intents (e.g., “philip seymour hoffman”) search engines may likely fail to present an overall historic perspective of the topic. In most such cases, they are limited to retrieving the most recent results. One possible solution to this shortcoming is to understand the different time periods of the query. In this context, most state-of-the-art methodologies consider any occurrence of temporal expressions in web documents and other web data as equally relevant to an implicit time sensitive query. To approach this problem in a more adequate manner, we propose in this paper the detection of relevant temporal expressions to the query. Unlike previous metadata and query log-based approaches, we show how to achieve this goal based on information extracted from document content. However, instead of simply focusing on the detection of the most obvious date we are also interested in retrieving the set of dates that are relevant to the query. Towards this goal, we define a general similarity measure that makes use of co-occurrences of words and years based on corpus statistics and a classification methodology that is able to identify the set of top relevant dates for a given implicit time sensitive query, while filtering out the non-relevant ones. Through extensive experimental evaluation, we mean to demonstrate that our approach offers promising results in the field of temporal information retrieval (T-IR), as demonstrated by the experiments conducted over several baselines on web corpora collections.

Keywords

Temporal information retrieval Implicit time sensitive queries Temporal query understanding Relevant temporal expressions 

Notes

Acknowledgements

This research was funded by Project "TEC4Growth - Pervasive Intelligence, Enhancers and Proofs of Concept with Industrial Impact/NORTE-01-0145-FEDER-000020" which is financed by the North Portugal Regional Operational Programme (NORTE 2020), under the PORTUGAL 2020 Partnership Agreement, and through the European Regional Development Fund (ERDF) and by national founds of FCT-Foundation for Science and Technology under UID/MAT/00212/2013.

References

  1. Alonso, O., Baeza-Yates, R., & Gertz, M. (2009). Effectiveness of Temporal Snippets. In WWW’09 workshop on web search result summarization and presentation (WSSP’09). Madrid, Spain. April 20.Google Scholar
  2. Alonso, O., Gertz, M., & Baeza-Yates, R. (2009). Clustering and exploring search results using timeline constructions. In Proceedings of the 18th international ACM conference on information and knowledge management (CIKM’09). Hong Kong, China. November 2–6 (pp. 97–106).Google Scholar
  3. Bollegala, D., Matsuo, Y., & Ishizuka, M. (2007) Measuring semantic similarity between words using web search engines. In Proceedings of the 16th international world wide web conference (WWW’07). Banff, Canada. May 8–12 (pp. 757–766).Google Scholar
  4. Brucato, M., & Danilo, M. (2014). Metric spaces for temporal information retrieval. In Proceedings of the European conference on IR research (ECIR’14). Amsterdam, Netherlands. April 13–16 (pp. 385–397).Google Scholar
  5. Campos, R., Dias, G., & Jorge, A. M. (2011a). What is the temporal value of web snippets? In WWW’11 workshop on temporal web analytics (TWAW’11). Hyderabad, India. March 28.Google Scholar
  6. Campos, R., Dias, G., Jorge, A. M., & Jatowt, A. (2014a). Survey of temporal information retrieval and related applications. ACM Computing Surveys, 47(2), 15.CrossRefGoogle Scholar
  7. Campos, R., Dias, G., Jorge, A. M., & Nunes, C. (2012). Enriching temporal query understanding through date identification: How to tag implicit temporal queries? In WWW’12 workshop on temporal web analytics (TempWeb’12). Lyon, France. April 17 (pp. 41–48).Google Scholar
  8. Campos, R., Dias, G., Jorge, A. M., & Nunes, C. (2014b). GTE-cluster: A temporal search interface for implicit temporal queries. In Proceedings of the European conference on IR research (ECIR’14). Amsterdam, Netherlands. April 13–16 (pp. 775–779).Google Scholar
  9. Campos, R., Dias, G., Jorge, A. M., & Nunes, C. (2014c). GTE-rank: Searching for implicit temporal query results. In Proceedings of the 23rd ACM international conference on information and knowledge management (CIKM’14). Shanghai, China. November 3–7 (pp. 2081–2083).Google Scholar
  10. Campos, R., Jorge, A., & Dias, G. (2011b). Using web snippets and query-logs to measure implicit temporal intents in queries. In SIGIR’11 workshop on query representation and understanding (QRU’11). Beijing, China. July 28 (pp. 13–16).Google Scholar
  11. Campos, R., Dias, G., Jorge, A., Nunes, C. (2016) GTE-Rank: A time-aware search engine to answer time-sensitive queries. Information Processing & Management, 52(2), 273–298CrossRefGoogle Scholar
  12. Church, K. W., & Hanks, P. (1990). Word association norms mutual information and lexicography. Computational Linguistics, 16(1), 23–29.Google Scholar
  13. Cilibrasi, R. L., & Vitányi, P. M. (2007). The google similarity distance. IEEE Transactions on Knowledge and Data Engineering, 19(3), 370–373.CrossRefGoogle Scholar
  14. Dakka, W., Gravano, L., & Ipeirotis, P. G. (2012). Answering general time sensitive queries. IEEE Transactions on Knowledge and Data Engineering, 24(2), 220–235.CrossRefGoogle Scholar
  15. Dias, G., Alves, E., & Lopes, J. (2007). Topic segmentation algorithms for text summarization and passage retrieval: An exhaustive evaluation. In Proceedings of the 22nd conference on artificial intelligence (AAAI’07). Vancouver, Canada. July 22–26 (pp. 1334–1340).Google Scholar
  16. Dice, L. R. (1945). Measures of the amount of ecologic association between species. Ecological Society of America, 26, 297–302.Google Scholar
  17. Efron, M., & Golovchinsky, G. (2011). Estimation methods for ranking recent information. In Proceedings of the 34th annual international ACM conference on research and development in information retrieval (SIGIR’11). Beijing, China. July 28 (pp. 495–504).Google Scholar
  18. Fleiss, J. L. (1971). Measuring nominal scale agreement among many raters. Psychological Bulletin, 76(5), 378–382.CrossRefGoogle Scholar
  19. Foley, J., & Allan, J. (2015). Retrieving time from scanned books. In Proceedings of the European conference on IR research (ECIR’15). Vienna, Austria. March 29–April 2 (pp. 221–232).Google Scholar
  20. Guo, Q., Diaz, F., & Yom-Tov, E. (2013). Updating users about time critical events. Advances in information retrieval—Lecture Notes in Computer Science (Vol. 7814, pp. 483–494).Google Scholar
  21. Gupta, D., & Berberich, K. (2014). Identifying time intervals of interest to queries. In Proceedings of the 23rd ACM international conference on information and knowledge management (CIKM’14). Shanghai, China. November 3–7 (pp. 1835–1838).Google Scholar
  22. Harris, Z. (1954). Distributional structure. Word, 10(23), 146–162.CrossRefGoogle Scholar
  23. Jaccard, P. (1901). Étude comparative de la distribution florale dans une portion des Alpes et des Jura. Bulletin de la Société Vaudoise des Sciences Naturelles, 37, 547–579.Google Scholar
  24. Jatowt, A., & Yeung, C. M. (2011). Extracting collective expectations about the future from large text collections. In Proceedings of the 20th ACM conference on information and knowledge management (CIKM’11). Glasgow, Scotland, UK. October 24–28 (pp. 1259–1264).Google Scholar
  25. Jatowt, A., Yeung, A.C.-M., & Tanaka, M. (2013). Estimating document focus time. In Proceedings of the 22nd ACM conference on information and knowledge management (CIKM’11). San Francisco, USA. October 27–November 01 (pp. 2273–2278).Google Scholar
  26. Joho, H., Jatowt, A., & Blanco, R. (2014). NTCIR temporalia: A test collection for temporal information access research. In WWW’14 workshop on temporal web analytics (TempWeb’14). Seoul, Korea. April 8 (pp. 845–849).Google Scholar
  27. Jones, R., & Diaz, F. (2007). Temporal profiles of queries. ACM Transaction on Information Systems, 25(3), 14.CrossRefGoogle Scholar
  28. Kanhabua, N., Blanco, R., & Matthews, M. (2011). Ranking related news predictions. In Proceedings of the 34th annual international ACM conference on research and development in information retrieval (SIGIR’11). Beijing, China. July 24–28 (pp. 755–764).Google Scholar
  29. Kanhabua, N., Blanco, R., & Nørvåg, K. (2015). Temporal information retrieval. Foundations and Trends in Information Retrieval, 9(2), 91–208.CrossRefGoogle Scholar
  30. Kanhabua, N., & Nørvåg, K. (2008). Improving temporal language models for determining time of non-timestamped documents. In Proceedings of the European conference on research and advanced technology for digital libraries (ECDL’10). Aarhus, Denmark. September 14–19 (pp. 358–370).Google Scholar
  31. Kanhabua, N., & Nørvåg, K. (2010). Determining time of queries for re-ranking search results. In Proceedings of the European conference on research and advanced technology for digital libraries (ECDL’10). Glasgow, Scotland. September 6–10 (pp. 261–272).Google Scholar
  32. Kanhabua, N., Romano, S., & Stewart, A. (2012). Identifying relevant temporal expressions for real-word events. In SIGIR’12 workshop on temporal, social and spatially-aware information access (TAIA’12). Portland, USA. August 16.Google Scholar
  33. Katzell, R. A., & Cureton, E. E. (1947). Biserial correlation and prediction. The Journal of Psychology, 24(2), 273–278.CrossRefGoogle Scholar
  34. Kawai, H., Jatowt, A., Tanaka, K., Kunieda, K., & Yamada, K. (2010). ChronoSeeker: Search engine for future and past events. In Proceedings of the 4th international conference on ubiquitous information management and communication (ICUIMC’10). Suwon, Republic of Korea. January 14–15 (pp. 166–175).Google Scholar
  35. Kulkarni, A., Teevan, J., Svore, K. M., & Dumais, S. T. (2011). Understanding temporal query dynamics. In Proceedings of the fourth ACM international conference on web search and data mining (WSDM’11). Hong Kong, China. February 9–12 (pp. 167–176).Google Scholar
  36. Li, X., & Croft, W. B. (2003). Time-based language models. In Proceedings of the 12th ACM conference on information and knowledge management (CIKM’03). New Orleans, Louisiana, USA. November 2–8 (pp. 469–475).Google Scholar
  37. Metzler, D., Jones, R., Peng, F., & Zhang, R. (2009). Improving search relevance for implicitly temporal queries. In Proceedings of the 32th annual international ACM conference on research and development in information retrieval (SIGIR’09). Boston, USA. July 19–23 (pp. 700–701).Google Scholar
  38. Moulahi, B., Lynda, T., & Sadok, B. Y. (2015). When time meets information retrieval: Past proposals, current plans and future trends. Journal of Information Science, 42(6), 1–24.Google Scholar
  39. Nunes, S., Ribeiro, C., David, G. (2008). Use of temporal expressions in web search. In Proceedings of the European conference on IR research (ECIR’08). Glasgow, Scotland. March 30–April 3 (pp. 580–584).Google Scholar
  40. Peetz, M.-H., Meij, E., & Rijke, M. (2014). Using temporal bursts for query modeling. Information Retrieval Journal, 17(1), 74–108.CrossRefGoogle Scholar
  41. Radinsky, K., Agichtein, E., Gabrilovich, E., & Markovitch, S. (2011). A word at a time: Computing word relatedness using temporal semantic analysis. In Proceedings of the 20th international conference on world wide web (WWW’11). Hyderabad, India. March 28–April 1 (pp. 337–346).Google Scholar
  42. Ren, P., Chen, Z., Song, X., Li, B., Yang, H., & Ma, J. (2013). Understanding temporal intent of user query based on time-based query classification. In Proceedings of the natural language processing and Chinese computing conference (NLPCC’13). Chongqing, China. November 15–19 (pp. 334–345).Google Scholar
  43. Shokouhi, M., & Radinsky, K. (2012). Time-sensitive query auto-completion. In Proceedings of the 35th annual international ACM conference on research and development in information retrieval (SIGIR’12). Portland, USA. August 12–16 (pp. 601–610).Google Scholar
  44. Silva, J. F., Dias, G., Guilloré, S., & Pereira, J. G. (1999). Using local maxs algorithm for the extraction of contiguous and non-contiguous multiword lexical units. In Proceedings of the 9th Portuguese conference in artificial intelligence (EPIA’99). Évora, Portugal. September 21–24 (pp. 21–24).Google Scholar
  45. Strötgen, J., Alonso, O., & Gertz, M. (2012). Identification of top relevant temporal expressions in documents. In WWW’12 workshop on temporal web analytics (TWAW’12). Lyon, France. April 17 (pp. 33–40).Google Scholar
  46. Strötgen, J., & Gertz, M. (2015). A baseline temporal tagger for all languages. In Proceedings of the 2015 conference on empirical methods in natural language processing (EMNLP’15). Lisbon, Portugal. September 17–21 (pp. 541–547).Google Scholar
  47. Tran, G., Herder, E., & Markert, K. (2015). Joint graphical models for date selection in timeline summarization. In Proceedings of the 53rd annual meeting of the association for computational linguistics and the 7th international joint conference on natural language processing of the asian federation of natural language processing (ACL’15). Beijing, China. July 26–31 (pp. 1598–1607).Google Scholar
  48. Tran, G., Tran, T., Tran, N. K., Alrifai, M., & Kanhabua, N. (2013). Leveraging learning to rank in an optimization framework for timeline summarization. In SIGIR 2013 workshop on temporal, social and spatially-aware information access (TAIA’13). Dublin, Ireland. August 1.Google Scholar
  49. Turney, P. D. (2011). Mining the web for synonyms: PMI-IR versus LSA on TOEFL. In Proceedings of the 12th European conference on machine learning (EMCL’01). Freiburg, Germany. September 5–7 (pp. 491–502).Google Scholar
  50. Vlachos, M., Meek, C., Vagena, Z., & Gunopulos, D. (2004). Identifying similarities, periodicities and bursts for online search queries. In Proceedings of the international conference on management of data (ICMD’04). Paris, France. June 13–18 (pp. 131–142).Google Scholar
  51. Witten, I. H., & Frank, E. (2005). Data mining: Practical machine learning tools and techniques (2nd ed.). San Francisco: Morgan Kaufmann.zbMATHGoogle Scholar

Copyright information

© Springer Science+Business Media New York 2017

Authors and Affiliations

  1. 1.ICT DepartmentPolytechnic Institute of TomarTomarPortugal
  2. 2.LIAAD/INESC TEC - INESC Technology and SciencePortoPortugal
  3. 3.HULTECH/GREYCUniversity of Caen Basse-NormandieCaenFrance
  4. 4.DCC – Faculty of SciencesUniversity of PortoPortoPortugal
  5. 5.Department of MathematicsUniversity of Beira InteriorCovilhãPortugal
  6. 6.Center of Mathematics and ApplicationsUniversity of Beira InteriorCovilhãPortugal

Personalised recommendations