Skip to main content

Towards Content Expiry Date Determination: Predicting Validity Periods of Sentences

  • Conference paper
  • First Online:
Advances in Information Retrieval (ECIR 2019)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 11437))

Included in the following conference series:

Abstract

Knowing how long text content will remain valid can be useful in many cases such as supporting the creation of documents to prolong their usefulness, improving document retrieval or enhancing credibility estimation. In this paper we introduce a novel research task of forecasting content’s validity period. Given an input sentence the task is to approximately determine until when the information stated in the content will remain valid. We propose machine learning approaches equipped with NLP and statistical features that can successfully work on a relatively small number of annotated data.

A. Almquist—This work was mainly done at the University of Gothenburg.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 109.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 139.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    http://www.timeml.org/tarsqi/modules/gutime/index.html.

  2. 2.

    https://nlp.stanford.edu/software/sutime.html.

  3. 3.

    https://github.com/HeidelTime/heideltime.

  4. 4.

    We assume that content is valid at its creation time.

  5. 5.

    https://en.wikipedia.org/wiki/Logarithmic_timeline.

  6. 6.

    In Experiments in Sec 5, we also test the case with the reduced set of three classes.

  7. 7.

    Determining the approximate expiry date requires then extending the actual creation time of a sentence with its predicted validity period.

  8. 8.

    https://radimrehurek.com/gensim/.

  9. 9.

    Text dump from 2018-05-01.

  10. 10.

    TIME, DATE and DURATION expressions often point to a specific point (or duration) in time which means that they can be used as explicit markers for when information ceases to be valid. However, SET expressions, such as “every day”, does not.

  11. 11.

    We used pre-trained word embeddings by Google created based on news: https://code.google.com/archive/p/word2vec/.

  12. 12.

    https://wordnet.princeton.edu/.

  13. 13.

    https://github.com/Ejhfast/empath-client.

  14. 14.

    http://commoncrawl.org/.

  15. 15.

    https://github.com/AxlAlm/ValidityPeriods-dataset.

  16. 16.

    http://u.cs.biu.ac.il/~koppel/BlogCorpus.htm.

  17. 17.

    https://www.kaggle.com/snapcrack/all-the-news/data.

  18. 18.

    http://kopiwiki.dsd.sztaki.hu/.

  19. 19.

    https://www.nltk.org/.

  20. 20.

    https://stanfordnlp.github.io/CoreNLP/.

  21. 21.

    e.g. “i’m to “I” and “am”.

  22. 22.

    LSA, average word length, sentence length, POS-tags, Sentence embeddings, TempoWordNet.

  23. 23.

    For LinearSVC we use \(C=0.7\).

  24. 24.

    For RBF we use \(C=60\).

  25. 25.

    We use 150 trees.

  26. 26.

    Five neighbors are used together with distance weighting.

  27. 27.

    https://keras.io/.

  28. 28.

    http://scikit-learn.org/stable/.

References

  1. Baeza-Yates, R.: Searching the future. In: ACM SIGIR Workshop MF/IR (2005)

    Google Scholar 

  2. Berberich, K., Gupta, D.: Identifying time intervals for knowledge graph facts. In: Proceeding WWW 2018 Companion. In: Proceedings of the Web Conference 2018, Lyon, France, 23–27 April 2018, pp. 37–38 (2018)

    Google Scholar 

  3. Campos, R., Dias, G., Jorge, A.M., Jatowt, A.: Survey of temporal information retrieval and related applications. ACM Comput. Surv. (CSUR) 47(2), 15 (2015)

    Google Scholar 

  4. Chang, A., Manning, C.: SUTIME: a library for recognizing and normalizing time expressions. In: Proceedings of the LREC 2012, Istanbul, Turkey, 23–25 May 2012

    Google Scholar 

  5. Chambers, N.: Labeling documents with timestamps: learning from their time expressions. In: ACL 2012 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers - Volume 1, Jeju Island, Korea, 08–14 July 2012, pp. 98–106 (2012)

    Google Scholar 

  6. Ciampaglia, G.L., Shiralkar, P., Rocha, L.M., Bollen, J., Menczer, F., Flammini, A.: Computational fact checking from knowledge networks. PLoS One 10(6), e0128193 (2015)

    Article  Google Scholar 

  7. Clark, S.: Vector space models of lexical meaning. In: Handbook of Contemporary Semantics (2015). https://doi.org/10.1002/9781118882139.ch16

    Chapter  Google Scholar 

  8. Dai, N., Shokouhi, M., Davison, B.D.: Learning to rank for freshness and relevance. In: Proceedings of the SIGIR 2011. Beijing, China, 24–28 July, pp. 95–104. ACM Press (2011)

    Google Scholar 

  9. Das, S., Mishra, A., Berberich, K., Setty, V.: Estimating event focus time using neural word embeddings. In: Proceedings of the 2017 ACM on Conference on Information and Knowledge Management, CIKM 2017, Singapore, 06–10 November 2017, pp. 2039–2042 (2017)

    Google Scholar 

  10. Dias, G., Campos, R., Jorge, A.: Future retrieval: what does the future talk about? In: Workshop on Enriching Information Retrieval of the 34th ACM Annual SIGIR Conference (SIGIR 2011), July 2011, Pekin, China, p. 3 (2011)

    Google Scholar 

  11. Dias, G., Hasanuzzaman, M., Ferrari, S., Mathet, Y.: TempoWordNet for sentence time tagging. In: 23rd International Conference on World Wide Web Companion, April 2014, Seoul, South Korea. WWW Companion 2014. In: Proceedings of the Companion Publication of the 23rd International Conference on World Wide Web Companion, pp. 833–838 (2014)

    Google Scholar 

  12. Efron, M., Golovchinsky, G.: Estimation methods for ranking recent information. In: Proceedings of the SIGIR 2011. Beijing, China, 24–28 July, pp. 495–504. ACM Press (2011)

    Google Scholar 

  13. Fast, E., Chen, B., Bernstein, M.S.: Empath: understanding topic signals in large-scale text. In: Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems, CHI 2016, pp. 4647–4657 (2016)

    Google Scholar 

  14. Grosser, Z., Schmidt, A.P., Bachl, M., Kunzmann, C.: Determining the outdatedness level of knowledge in collaboration spaces using a machine learning-based approach. Professionelles Wissensmanagement. Tagungsband der 9. Konferenz Professionelles Wissensmanagement (Professional Knowledge Management) Karlsruhe, Germany, 5–7 April 2017

    Google Scholar 

  15. Hassan, N., et al.: The quest to automate fact-checking. In: Proceedings of the 2015 Computation + Journalism Symposium (2015)

    Google Scholar 

  16. Hassan, N., Arslan, F., Li, C., Tremayne, M.: Towards automated fact-checking: detecting check-worthy factual claims by claimbuster. In: Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2017, Halifax, NS, Canada, 13–17 August, pp. 1803–1812 (2017)

    Google Scholar 

  17. Jatowt, A., Yeung, C.-M.A., Tanaka, K.: Estimating document focus time. In: Proceedings of the 22nd ACM international conference on Information & Knowledge Management, CIKM 2013, San Francisco, California, USA, October 27 – November 01, pp. 2273–2278 (2013)

    Google Scholar 

  18. Jatowt, A., Yeung, C.-M.A.: Extracting collective expectations about the future from large text collections. In: Proceedings of the 20th ACM International Conference on Information and Knowledge Management, CIKM 2011, Glasgow, Scotland, UK, October 24–28, pp. 1259–1264 (2011)

    Google Scholar 

  19. Kanhabua, N., Nørvåg, K.: Improving temporal language models for determining time of non-timestamped documents. In: Proceedings of the 12th European conference on Research and Advanced Technology for Digital Libraries, ECDL 2008, Aarhus, Denmark, 14–19 September, pp. 358–370 (2008)

    Google Scholar 

  20. Kumar, A., Baldridge, J., Lease, M., Ghosh, J.: Dating Texts without Explicit Temporal Cues. CoRR, abs/1211.2290 (2012)

    Google Scholar 

  21. Ling, X., Weld, D.: Temporal information extraction. In: Twenty-Fourth AAAI Conference on Artificial Intelligence (2010)

    Google Scholar 

  22. Graves, L.: Understanding the Promise and Limits of Automated Fact-Checking. University of Oxford, Reuters institute, 28 February 2018. https://reutersinstitute.politics.ox.ac.uk/our-research/understanding-promise-and-limits-automated-fact-checking

  23. Morbidoni, C., Cucchiarelli, A., Ursino, D.: Leveraging linked entities to estimate focus time of short texts. In: IDEAS 2018 Proceedings of the 22nd International Database Engineering & Applications Symposium, Villa San Giovanni, Italy, 18–20, pp. 282–286, June 2018

    Google Scholar 

  24. Perkiö, J., Buntine, W., Tirri, H.: A temporally adaptative content-based relevance ranking algorithm. In Proceedings of the SIGIR 2005. Salvador, Brazil. 15–16 August, pp. 647–648. ACM Press (2005)

    Google Scholar 

  25. Popat, K., Mukherjee, S., Weikum, G.: Credibility assessment of textual claims on the web. In: CIKM 2016 Proceedings of the 25th ACM International on Conference on Information and Knowledge Management, Indianapolis, Indiana, USA, 24–28 October, pp. 2173–2178 (2016)

    Google Scholar 

  26. Sato, S., Uehar, M., Sakai, Y.: Temporal ranking for fresh information retrieval. In: Proceeding AsianIR 2003 Proceedings of the Sixth International Workshop on Information Retrieval with Asian Languages, Sapporo, Japan, vol. 11, pp. 116–123, 07 July 2003

    Google Scholar 

  27. Sil, A., Cucerzan, S.: Temporal scoping of relational facts based on Wikipedia data. In: Proceedings of the Eighteenth Conference on Computational Language Learning, Baltimore, Maryland USA, 26–27 June 2014, pp. 109–118. Association for Computational Linguistics (2014)

    Google Scholar 

  28. Styskin, A., Romanenko, F., Vorobyev, F., Serdyukov, P.: Recency ranking by diversification of result set. In: Proceedings of the CIKM 2011, Glasgow, Scotland, UK, 24–28 October, pp. 1949–1952. ACM Press (2011)

    Google Scholar 

  29. Takemura, H., Tajima, K.: Tweet classification based on their lifetime duration published. In: Proceedings of CIKM 2012, Maui, pp. 2367–2370, October 2012

    Google Scholar 

  30. Talukdar, P.P., Wijaya, D., Mitchell, T.: Coupled temporal scoping of relational facts. In: Proceedings of the WSDM 2012, Seattle, USA, 8–12 February, pp. 73–82. ACM Press (2012)

    Google Scholar 

  31. Wang, Y., Zhu, M., Qu, L., Spaniol, M., Weikum, G.: Timely YAGO: harvesting, querying, and visualizing temporal knowledge from Wikipedia. In: EDBT 2010 Proceedings of the 13th International Conference on Extending Database Technology, Lausanne, Switzerland, 22–26 March, pp. 697–700 (2010)

    Google Scholar 

  32. Wang, Y., Yang, B., Qu, L., Spaniol, M., Weikum, G.: Harvesting facts from textual web sources by constrained label propagation. In: CIKM 2011 Proceedings of the 20th ACM International Conference on Information and Knowledge Management, Glasgow, Scotland, UK, 24–28 October, pp. 837–846 (2011)

    Google Scholar 

  33. Yamamoto, Y., Tezuka, T., Jatowt, A., Tanaka, K.: Honto? search: estimating trustworthiness of web information by search results aggregation and temporal analysis. In: Dong, G., Lin, X., Wang, W., Yang, Y., Yu, J.X. (eds.) APWeb/WAIM -2007. LNCS, vol. 4505, pp. 253–264. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-72524-4_28

    Chapter  Google Scholar 

  34. You, W., Agarwal, P.K., Li, C., Yang, J., Cong, Y.: Toward computational fact-checking. J. Proc. VLDB Endowment 7(7), 589–600 (2014)

    Article  Google Scholar 

Download references

Acknowledgements

We thank Nina Tahmasebi for valuable comments and encouragement. This research has been supported by JSPS KAKENHI Grants (#17H01828, #18K19841) and by Microsoft Research Asia 2018 Collaborative Research Grant.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Adam Jatowt .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Almquist, A., Jatowt, A. (2019). Towards Content Expiry Date Determination: Predicting Validity Periods of Sentences. In: Azzopardi, L., Stein, B., Fuhr, N., Mayr, P., Hauff, C., Hiemstra, D. (eds) Advances in Information Retrieval. ECIR 2019. Lecture Notes in Computer Science(), vol 11437. Springer, Cham. https://doi.org/10.1007/978-3-030-15712-8_6

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-15712-8_6

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-15711-1

  • Online ISBN: 978-3-030-15712-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics