Despite a clear improvement of search and retrieval temporal applications, current search engines are still mostly unaware of the temporal dimension. Indeed, in most cases, systems are limited to offering the user the chance to restrict the search to a particular time period or to simply rely on an explicitly specified time span. If the user is not explicit in his/her search intents (e.g., “philip seymour hoffman”) search engines may likely fail to present an overall historic perspective of the topic. In most such cases, they are limited to retrieving the most recent results. One possible solution to this shortcoming is to understand the different time periods of the query. In this context, most state-of-the-art methodologies consider any occurrence of temporal expressions in web documents and other web data as equally relevant to an implicit time sensitive query. To approach this problem in a more adequate manner, we propose in this paper the detection of relevant temporal expressions to the query. Unlike previous metadata and query log-based approaches, we show how to achieve this goal based on information extracted from document content. However, instead of simply focusing on the detection of the most obvious date we are also interested in retrieving the set of dates that are relevant to the query. Towards this goal, we define a general similarity measure that makes use of co-occurrences of words and years based on corpus statistics and a classification methodology that is able to identify the set of top relevant dates for a given implicit time sensitive query, while filtering out the non-relevant ones. Through extensive experimental evaluation, we mean to demonstrate that our approach offers promising results in the field of temporal information retrieval (T-IR), as demonstrated by the experiments conducted over several baselines on web corpora collections.
This is a preview of subscription content, log in to check access.
Buy single article
Instant access to the full article PDF.
Tax calculation will be finalised during checkout.
Subscribe to journal
Immediate online access to all issues from 2019. Subscription will auto renew annually.
Tax calculation will be finalised during checkout.
http://dbs.ifi.uni-heidelberg.de/index.php?id=form-downloads [December 22, 2016].
We understand non-relevant dates as temporal patterns which though being dates are not relevant to the query (e.g., avatar movie 2011) and wrong ones as those, which though being a temporal pattern do not form a data (e.g., 1500 photos).
i.e. that co-occur at least once with d j .
Please refer to Eq. 15 in case you need to recall DICE coefficient.
http://trec.nist.gov [December 22, 2016].
http://trec.nist.gov/data/t13_novelty.html [December 22, 2016].
http://trec.nist.gov/data/t13_robust.html [December 22, 2016].
http://www.trec-ts.org [December 22, 2016].
http://ntcir.nii.ac.jp/Temporalia/NTCIR-11-Temporalia/ [December 22, 2016].
http://www.ccc.ipt.pt/~ricardo/datasets/WC_DS.html [December 22, 2016].
http://www.ccc.ipt.pt/~ricardo/datasets/WC_TREC_DS.html [December 22, 2016].
http://tm-websuiteapps.ipt.pt/GTEAspNetFlatTempCluster_Server/ [December 22, 2016].
https://www.microsoft.com/cognitive-services/en-us/bing-web-search-api [December 22, 2016].
http://www.diffbot.com [December 22, 2016].
http://bit.ly/2gk5DXX [December 22, 2016].
Due to the similarity of the equations.
Alonso, O., Baeza-Yates, R., & Gertz, M. (2009). Effectiveness of Temporal Snippets. In WWW’09 workshop on web search result summarization and presentation (WSSP’09). Madrid, Spain. April 20.
Alonso, O., Gertz, M., & Baeza-Yates, R. (2009). Clustering and exploring search results using timeline constructions. In Proceedings of the 18th international ACM conference on information and knowledge management (CIKM’09). Hong Kong, China. November 2–6 (pp. 97–106).
Bollegala, D., Matsuo, Y., & Ishizuka, M. (2007) Measuring semantic similarity between words using web search engines. In Proceedings of the 16th international world wide web conference (WWW’07). Banff, Canada. May 8–12 (pp. 757–766).
Brucato, M., & Danilo, M. (2014). Metric spaces for temporal information retrieval. In Proceedings of the European conference on IR research (ECIR’14). Amsterdam, Netherlands. April 13–16 (pp. 385–397).
Campos, R., Dias, G., & Jorge, A. M. (2011a). What is the temporal value of web snippets? In WWW’11 workshop on temporal web analytics (TWAW’11). Hyderabad, India. March 28.
Campos, R., Dias, G., Jorge, A. M., & Jatowt, A. (2014a). Survey of temporal information retrieval and related applications. ACM Computing Surveys, 47(2), 15.
Campos, R., Dias, G., Jorge, A. M., & Nunes, C. (2012). Enriching temporal query understanding through date identification: How to tag implicit temporal queries? In WWW’12 workshop on temporal web analytics (TempWeb’12). Lyon, France. April 17 (pp. 41–48).
Campos, R., Dias, G., Jorge, A. M., & Nunes, C. (2014b). GTE-cluster: A temporal search interface for implicit temporal queries. In Proceedings of the European conference on IR research (ECIR’14). Amsterdam, Netherlands. April 13–16 (pp. 775–779).
Campos, R., Dias, G., Jorge, A. M., & Nunes, C. (2014c). GTE-rank: Searching for implicit temporal query results. In Proceedings of the 23rd ACM international conference on information and knowledge management (CIKM’14). Shanghai, China. November 3–7 (pp. 2081–2083).
Campos, R., Jorge, A., & Dias, G. (2011b). Using web snippets and query-logs to measure implicit temporal intents in queries. In SIGIR’11 workshop on query representation and understanding (QRU’11). Beijing, China. July 28 (pp. 13–16).
Campos, R., Dias, G., Jorge, A., Nunes, C. (2016) GTE-Rank: A time-aware search engine to answer time-sensitive queries. Information Processing & Management, 52(2), 273–298
Church, K. W., & Hanks, P. (1990). Word association norms mutual information and lexicography. Computational Linguistics, 16(1), 23–29.
Cilibrasi, R. L., & Vitányi, P. M. (2007). The google similarity distance. IEEE Transactions on Knowledge and Data Engineering, 19(3), 370–373.
Dakka, W., Gravano, L., & Ipeirotis, P. G. (2012). Answering general time sensitive queries. IEEE Transactions on Knowledge and Data Engineering, 24(2), 220–235.
Dias, G., Alves, E., & Lopes, J. (2007). Topic segmentation algorithms for text summarization and passage retrieval: An exhaustive evaluation. In Proceedings of the 22nd conference on artificial intelligence (AAAI’07). Vancouver, Canada. July 22–26 (pp. 1334–1340).
Dice, L. R. (1945). Measures of the amount of ecologic association between species. Ecological Society of America, 26, 297–302.
Efron, M., & Golovchinsky, G. (2011). Estimation methods for ranking recent information. In Proceedings of the 34th annual international ACM conference on research and development in information retrieval (SIGIR’11). Beijing, China. July 28 (pp. 495–504).
Fleiss, J. L. (1971). Measuring nominal scale agreement among many raters. Psychological Bulletin, 76(5), 378–382.
Foley, J., & Allan, J. (2015). Retrieving time from scanned books. In Proceedings of the European conference on IR research (ECIR’15). Vienna, Austria. March 29–April 2 (pp. 221–232).
Guo, Q., Diaz, F., & Yom-Tov, E. (2013). Updating users about time critical events. Advances in information retrieval—Lecture Notes in Computer Science (Vol. 7814, pp. 483–494).
Gupta, D., & Berberich, K. (2014). Identifying time intervals of interest to queries. In Proceedings of the 23rd ACM international conference on information and knowledge management (CIKM’14). Shanghai, China. November 3–7 (pp. 1835–1838).
Harris, Z. (1954). Distributional structure. Word, 10(23), 146–162.
Jaccard, P. (1901). Étude comparative de la distribution florale dans une portion des Alpes et des Jura. Bulletin de la Société Vaudoise des Sciences Naturelles, 37, 547–579.
Jatowt, A., & Yeung, C. M. (2011). Extracting collective expectations about the future from large text collections. In Proceedings of the 20th ACM conference on information and knowledge management (CIKM’11). Glasgow, Scotland, UK. October 24–28 (pp. 1259–1264).
Jatowt, A., Yeung, A.C.-M., & Tanaka, M. (2013). Estimating document focus time. In Proceedings of the 22nd ACM conference on information and knowledge management (CIKM’11). San Francisco, USA. October 27–November 01 (pp. 2273–2278).
Joho, H., Jatowt, A., & Blanco, R. (2014). NTCIR temporalia: A test collection for temporal information access research. In WWW’14 workshop on temporal web analytics (TempWeb’14). Seoul, Korea. April 8 (pp. 845–849).
Jones, R., & Diaz, F. (2007). Temporal profiles of queries. ACM Transaction on Information Systems, 25(3), 14.
Kanhabua, N., Blanco, R., & Matthews, M. (2011). Ranking related news predictions. In Proceedings of the 34th annual international ACM conference on research and development in information retrieval (SIGIR’11). Beijing, China. July 24–28 (pp. 755–764).
Kanhabua, N., Blanco, R., & Nørvåg, K. (2015). Temporal information retrieval. Foundations and Trends in Information Retrieval, 9(2), 91–208.
Kanhabua, N., & Nørvåg, K. (2008). Improving temporal language models for determining time of non-timestamped documents. In Proceedings of the European conference on research and advanced technology for digital libraries (ECDL’10). Aarhus, Denmark. September 14–19 (pp. 358–370).
Kanhabua, N., & Nørvåg, K. (2010). Determining time of queries for re-ranking search results. In Proceedings of the European conference on research and advanced technology for digital libraries (ECDL’10). Glasgow, Scotland. September 6–10 (pp. 261–272).
Kanhabua, N., Romano, S., & Stewart, A. (2012). Identifying relevant temporal expressions for real-word events. In SIGIR’12 workshop on temporal, social and spatially-aware information access (TAIA’12). Portland, USA. August 16.
Katzell, R. A., & Cureton, E. E. (1947). Biserial correlation and prediction. The Journal of Psychology, 24(2), 273–278.
Kawai, H., Jatowt, A., Tanaka, K., Kunieda, K., & Yamada, K. (2010). ChronoSeeker: Search engine for future and past events. In Proceedings of the 4th international conference on ubiquitous information management and communication (ICUIMC’10). Suwon, Republic of Korea. January 14–15 (pp. 166–175).
Kulkarni, A., Teevan, J., Svore, K. M., & Dumais, S. T. (2011). Understanding temporal query dynamics. In Proceedings of the fourth ACM international conference on web search and data mining (WSDM’11). Hong Kong, China. February 9–12 (pp. 167–176).
Li, X., & Croft, W. B. (2003). Time-based language models. In Proceedings of the 12th ACM conference on information and knowledge management (CIKM’03). New Orleans, Louisiana, USA. November 2–8 (pp. 469–475).
Metzler, D., Jones, R., Peng, F., & Zhang, R. (2009). Improving search relevance for implicitly temporal queries. In Proceedings of the 32th annual international ACM conference on research and development in information retrieval (SIGIR’09). Boston, USA. July 19–23 (pp. 700–701).
Moulahi, B., Lynda, T., & Sadok, B. Y. (2015). When time meets information retrieval: Past proposals, current plans and future trends. Journal of Information Science, 42(6), 1–24.
Nunes, S., Ribeiro, C., David, G. (2008). Use of temporal expressions in web search. In Proceedings of the European conference on IR research (ECIR’08). Glasgow, Scotland. March 30–April 3 (pp. 580–584).
Peetz, M.-H., Meij, E., & Rijke, M. (2014). Using temporal bursts for query modeling. Information Retrieval Journal, 17(1), 74–108.
Radinsky, K., Agichtein, E., Gabrilovich, E., & Markovitch, S. (2011). A word at a time: Computing word relatedness using temporal semantic analysis. In Proceedings of the 20th international conference on world wide web (WWW’11). Hyderabad, India. March 28–April 1 (pp. 337–346).
Ren, P., Chen, Z., Song, X., Li, B., Yang, H., & Ma, J. (2013). Understanding temporal intent of user query based on time-based query classification. In Proceedings of the natural language processing and Chinese computing conference (NLPCC’13). Chongqing, China. November 15–19 (pp. 334–345).
Shokouhi, M., & Radinsky, K. (2012). Time-sensitive query auto-completion. In Proceedings of the 35th annual international ACM conference on research and development in information retrieval (SIGIR’12). Portland, USA. August 12–16 (pp. 601–610).
Silva, J. F., Dias, G., Guilloré, S., & Pereira, J. G. (1999). Using local maxs algorithm for the extraction of contiguous and non-contiguous multiword lexical units. In Proceedings of the 9th Portuguese conference in artificial intelligence (EPIA’99). Évora, Portugal. September 21–24 (pp. 21–24).
Strötgen, J., Alonso, O., & Gertz, M. (2012). Identification of top relevant temporal expressions in documents. In WWW’12 workshop on temporal web analytics (TWAW’12). Lyon, France. April 17 (pp. 33–40).
Strötgen, J., & Gertz, M. (2015). A baseline temporal tagger for all languages. In Proceedings of the 2015 conference on empirical methods in natural language processing (EMNLP’15). Lisbon, Portugal. September 17–21 (pp. 541–547).
Tran, G., Herder, E., & Markert, K. (2015). Joint graphical models for date selection in timeline summarization. In Proceedings of the 53rd annual meeting of the association for computational linguistics and the 7th international joint conference on natural language processing of the asian federation of natural language processing (ACL’15). Beijing, China. July 26–31 (pp. 1598–1607).
Tran, G., Tran, T., Tran, N. K., Alrifai, M., & Kanhabua, N. (2013). Leveraging learning to rank in an optimization framework for timeline summarization. In SIGIR 2013 workshop on temporal, social and spatially-aware information access (TAIA’13). Dublin, Ireland. August 1.
Turney, P. D. (2011). Mining the web for synonyms: PMI-IR versus LSA on TOEFL. In Proceedings of the 12th European conference on machine learning (EMCL’01). Freiburg, Germany. September 5–7 (pp. 491–502).
Vlachos, M., Meek, C., Vagena, Z., & Gunopulos, D. (2004). Identifying similarities, periodicities and bursts for online search queries. In Proceedings of the international conference on management of data (ICMD’04). Paris, France. June 13–18 (pp. 131–142).
Witten, I. H., & Frank, E. (2005). Data mining: Practical machine learning tools and techniques (2nd ed.). San Francisco: Morgan Kaufmann.
This research was funded by Project "TEC4Growth - Pervasive Intelligence, Enhancers and Proofs of Concept with Industrial Impact/NORTE-01-0145-FEDER-000020" which is financed by the North Portugal Regional Operational Programme (NORTE 2020), under the PORTUGAL 2020 Partnership Agreement, and through the European Regional Development Fund (ERDF) and by national founds of FCT-Foundation for Science and Technology under UID/MAT/00212/2013.
A short version of this work has appeared in Proceedings of TempWeb@WWW’12 (Campos et al. 2012).
About this article
Cite this article
Campos, R., Dias, G., Jorge, A.M. et al. Identifying top relevant dates for implicit time sensitive queries. Inf Retrieval J 20, 363–398 (2017). https://doi.org/10.1007/s10791-017-9302-1
- Temporal information retrieval
- Implicit time sensitive queries
- Temporal query understanding
- Relevant temporal expressions