Towards Content Expiry Date Determination: Predicting Validity Periods of Sentences

Almquist, Axel; Jatowt, Adam

doi:10.1007/978-3-030-15712-8_6

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 11437))

Included in the following conference series:

European Conference on Information Retrieval

2578 Accesses
4 Citations

Abstract

Knowing how long text content will remain valid can be useful in many cases such as supporting the creation of documents to prolong their usefulness, improving document retrieval or enhancing credibility estimation. In this paper we introduce a novel research task of forecasting content’s validity period. Given an input sentence the task is to approximately determine until when the information stated in the content will remain valid. We propose machine learning approaches equipped with NLP and statistical features that can successfully work on a relatively small number of annotated data.

A. Almquist—This work was mainly done at the University of Gothenburg.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 109.00; Price excludes VAT (USA)

Softcover Book: USD 139.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Temporal Natural Language Inference: Evidence-Based Evaluation of Temporal Text Validity

Temporal validity reassessment: commonsense reasoning about information obsoleteness

Article Open access 06 May 2024

Automatic Prediction of Future Business Conditions

Notes

1.
http://www.timeml.org/tarsqi/modules/gutime/index.html.
2.
https://nlp.stanford.edu/software/sutime.html.
3.
https://github.com/HeidelTime/heideltime.
4.
We assume that content is valid at its creation time.
5.
https://en.wikipedia.org/wiki/Logarithmic_timeline.
6.
In Experiments in Sec 5, we also test the case with the reduced set of three classes.
7.
Determining the approximate expiry date requires then extending the actual creation time of a sentence with its predicted validity period.
8.
https://radimrehurek.com/gensim/.
9.
Text dump from 2018-05-01.
10.
TIME, DATE and DURATION expressions often point to a specific point (or duration) in time which means that they can be used as explicit markers for when information ceases to be valid. However, SET expressions, such as “every day”, does not.
11.
We used pre-trained word embeddings by Google created based on news: https://code.google.com/archive/p/word2vec/.
12.
https://wordnet.princeton.edu/.
13.
https://github.com/Ejhfast/empath-client.
14.
http://commoncrawl.org/.
15.
https://github.com/AxlAlm/ValidityPeriods-dataset.
16.
http://u.cs.biu.ac.il/~koppel/BlogCorpus.htm.
17.
https://www.kaggle.com/snapcrack/all-the-news/data.
18.
http://kopiwiki.dsd.sztaki.hu/.
19.
https://www.nltk.org/.
20.
https://stanfordnlp.github.io/CoreNLP/.
21.
e.g. “i’m to “I” and “am”.
22.
LSA, average word length, sentence length, POS-tags, Sentence embeddings, TempoWordNet.
23.
For LinearSVC we use \(C=0.7\).
24.
For RBF we use \(C=60\).
25.
We use 150 trees.
26.
Five neighbors are used together with distance weighting.
27.
https://keras.io/.
28.
http://scikit-learn.org/stable/.

References

Baeza-Yates, R.: Searching the future. In: ACM SIGIR Workshop MF/IR (2005)
Google Scholar
Berberich, K., Gupta, D.: Identifying time intervals for knowledge graph facts. In: Proceeding WWW 2018 Companion. In: Proceedings of the Web Conference 2018, Lyon, France, 23–27 April 2018, pp. 37–38 (2018)
Google Scholar
Campos, R., Dias, G., Jorge, A.M., Jatowt, A.: Survey of temporal information retrieval and related applications. ACM Comput. Surv. (CSUR) 47(2), 15 (2015)
Google Scholar
Chang, A., Manning, C.: SUTIME: a library for recognizing and normalizing time expressions. In: Proceedings of the LREC 2012, Istanbul, Turkey, 23–25 May 2012
Google Scholar
Chambers, N.: Labeling documents with timestamps: learning from their time expressions. In: ACL 2012 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers - Volume 1, Jeju Island, Korea, 08–14 July 2012, pp. 98–106 (2012)
Google Scholar
Ciampaglia, G.L., Shiralkar, P., Rocha, L.M., Bollen, J., Menczer, F., Flammini, A.: Computational fact checking from knowledge networks. PLoS One 10(6), e0128193 (2015)
Article Google Scholar
Clark, S.: Vector space models of lexical meaning. In: Handbook of Contemporary Semantics (2015). https://doi.org/10.1002/9781118882139.ch16
Chapter Google Scholar
Dai, N., Shokouhi, M., Davison, B.D.: Learning to rank for freshness and relevance. In: Proceedings of the SIGIR 2011. Beijing, China, 24–28 July, pp. 95–104. ACM Press (2011)
Google Scholar
Das, S., Mishra, A., Berberich, K., Setty, V.: Estimating event focus time using neural word embeddings. In: Proceedings of the 2017 ACM on Conference on Information and Knowledge Management, CIKM 2017, Singapore, 06–10 November 2017, pp. 2039–2042 (2017)
Google Scholar
Dias, G., Campos, R., Jorge, A.: Future retrieval: what does the future talk about? In: Workshop on Enriching Information Retrieval of the 34th ACM Annual SIGIR Conference (SIGIR 2011), July 2011, Pekin, China, p. 3 (2011)
Google Scholar
Dias, G., Hasanuzzaman, M., Ferrari, S., Mathet, Y.: TempoWordNet for sentence time tagging. In: 23rd International Conference on World Wide Web Companion, April 2014, Seoul, South Korea. WWW Companion 2014. In: Proceedings of the Companion Publication of the 23rd International Conference on World Wide Web Companion, pp. 833–838 (2014)
Google Scholar
Efron, M., Golovchinsky, G.: Estimation methods for ranking recent information. In: Proceedings of the SIGIR 2011. Beijing, China, 24–28 July, pp. 495–504. ACM Press (2011)
Google Scholar
Fast, E., Chen, B., Bernstein, M.S.: Empath: understanding topic signals in large-scale text. In: Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems, CHI 2016, pp. 4647–4657 (2016)
Google Scholar
Grosser, Z., Schmidt, A.P., Bachl, M., Kunzmann, C.: Determining the outdatedness level of knowledge in collaboration spaces using a machine learning-based approach. Professionelles Wissensmanagement. Tagungsband der 9. Konferenz Professionelles Wissensmanagement (Professional Knowledge Management) Karlsruhe, Germany, 5–7 April 2017
Google Scholar
Hassan, N., et al.: The quest to automate fact-checking. In: Proceedings of the 2015 Computation + Journalism Symposium (2015)
Google Scholar
Hassan, N., Arslan, F., Li, C., Tremayne, M.: Towards automated fact-checking: detecting check-worthy factual claims by claimbuster. In: Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2017, Halifax, NS, Canada, 13–17 August, pp. 1803–1812 (2017)
Google Scholar
Jatowt, A., Yeung, C.-M.A., Tanaka, K.: Estimating document focus time. In: Proceedings of the 22nd ACM international conference on Information & Knowledge Management, CIKM 2013, San Francisco, California, USA, October 27 – November 01, pp. 2273–2278 (2013)
Google Scholar
Jatowt, A., Yeung, C.-M.A.: Extracting collective expectations about the future from large text collections. In: Proceedings of the 20th ACM International Conference on Information and Knowledge Management, CIKM 2011, Glasgow, Scotland, UK, October 24–28, pp. 1259–1264 (2011)
Google Scholar
Kanhabua, N., Nørvåg, K.: Improving temporal language models for determining time of non-timestamped documents. In: Proceedings of the 12th European conference on Research and Advanced Technology for Digital Libraries, ECDL 2008, Aarhus, Denmark, 14–19 September, pp. 358–370 (2008)
Google Scholar
Kumar, A., Baldridge, J., Lease, M., Ghosh, J.: Dating Texts without Explicit Temporal Cues. CoRR, abs/1211.2290 (2012)
Google Scholar
Ling, X., Weld, D.: Temporal information extraction. In: Twenty-Fourth AAAI Conference on Artificial Intelligence (2010)
Google Scholar
Graves, L.: Understanding the Promise and Limits of Automated Fact-Checking. University of Oxford, Reuters institute, 28 February 2018. https://reutersinstitute.politics.ox.ac.uk/our-research/understanding-promise-and-limits-automated-fact-checking
Morbidoni, C., Cucchiarelli, A., Ursino, D.: Leveraging linked entities to estimate focus time of short texts. In: IDEAS 2018 Proceedings of the 22nd International Database Engineering & Applications Symposium, Villa San Giovanni, Italy, 18–20, pp. 282–286, June 2018
Google Scholar
Perkiö, J., Buntine, W., Tirri, H.: A temporally adaptative content-based relevance ranking algorithm. In Proceedings of the SIGIR 2005. Salvador, Brazil. 15–16 August, pp. 647–648. ACM Press (2005)
Google Scholar
Popat, K., Mukherjee, S., Weikum, G.: Credibility assessment of textual claims on the web. In: CIKM 2016 Proceedings of the 25th ACM International on Conference on Information and Knowledge Management, Indianapolis, Indiana, USA, 24–28 October, pp. 2173–2178 (2016)
Google Scholar
Sato, S., Uehar, M., Sakai, Y.: Temporal ranking for fresh information retrieval. In: Proceeding AsianIR 2003 Proceedings of the Sixth International Workshop on Information Retrieval with Asian Languages, Sapporo, Japan, vol. 11, pp. 116–123, 07 July 2003
Google Scholar
Sil, A., Cucerzan, S.: Temporal scoping of relational facts based on Wikipedia data. In: Proceedings of the Eighteenth Conference on Computational Language Learning, Baltimore, Maryland USA, 26–27 June 2014, pp. 109–118. Association for Computational Linguistics (2014)
Google Scholar
Styskin, A., Romanenko, F., Vorobyev, F., Serdyukov, P.: Recency ranking by diversification of result set. In: Proceedings of the CIKM 2011, Glasgow, Scotland, UK, 24–28 October, pp. 1949–1952. ACM Press (2011)
Google Scholar
Takemura, H., Tajima, K.: Tweet classification based on their lifetime duration published. In: Proceedings of CIKM 2012, Maui, pp. 2367–2370, October 2012
Google Scholar
Talukdar, P.P., Wijaya, D., Mitchell, T.: Coupled temporal scoping of relational facts. In: Proceedings of the WSDM 2012, Seattle, USA, 8–12 February, pp. 73–82. ACM Press (2012)
Google Scholar
Wang, Y., Zhu, M., Qu, L., Spaniol, M., Weikum, G.: Timely YAGO: harvesting, querying, and visualizing temporal knowledge from Wikipedia. In: EDBT 2010 Proceedings of the 13th International Conference on Extending Database Technology, Lausanne, Switzerland, 22–26 March, pp. 697–700 (2010)
Google Scholar
Wang, Y., Yang, B., Qu, L., Spaniol, M., Weikum, G.: Harvesting facts from textual web sources by constrained label propagation. In: CIKM 2011 Proceedings of the 20th ACM International Conference on Information and Knowledge Management, Glasgow, Scotland, UK, 24–28 October, pp. 837–846 (2011)
Google Scholar
Yamamoto, Y., Tezuka, T., Jatowt, A., Tanaka, K.: Honto? search: estimating trustworthiness of web information by search results aggregation and temporal analysis. In: Dong, G., Lin, X., Wang, W., Yang, Y., Yu, J.X. (eds.) APWeb/WAIM -2007. LNCS, vol. 4505, pp. 253–264. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-72524-4_28
Chapter Google Scholar
You, W., Agarwal, P.K., Li, C., Yang, J., Cong, Y.: Toward computational fact-checking. J. Proc. VLDB Endowment 7(7), 589–600 (2014)
Article Google Scholar

Download references

Acknowledgements

We thank Nina Tahmasebi for valuable comments and encouragement. This research has been supported by JSPS KAKENHI Grants (#17H01828, #18K19841) and by Microsoft Research Asia 2018 Collaborative Research Grant.

Author information

Authors and Affiliations

SentiSum, London, UK
Axel Almquist
Kyoto University, Kyoto, 606-8501, Japan
Adam Jatowt

Authors

Axel Almquist
View author publications
You can also search for this author in PubMed Google Scholar
Adam Jatowt
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Adam Jatowt .

Editor information

Editors and Affiliations

University of Strathclyde, Glasgow, UK
Leif Azzopardi
Bauhaus Universität Weimar, Weimar, Germany
Benno Stein
Universität Duisburg-Essen, Duisburg, Germany
Norbert Fuhr
GESIS - Leibniz Institute for the Social Sciences, Cologne, Germany
Philipp Mayr
Delft University of Technology, Delft, The Netherlands
Claudia Hauff
University of Twente, Enschede, The Netherlands
Djoerd Hiemstra

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Almquist, A., Jatowt, A. (2019). Towards Content Expiry Date Determination: Predicting Validity Periods of Sentences. In: Azzopardi, L., Stein, B., Fuhr, N., Mayr, P., Hauff, C., Hiemstra, D. (eds) Advances in Information Retrieval. ECIR 2019. Lecture Notes in Computer Science(), vol 11437. Springer, Cham. https://doi.org/10.1007/978-3-030-15712-8_6

Download citation

DOI: https://doi.org/10.1007/978-3-030-15712-8_6
Published: 07 April 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-15711-1
Online ISBN: 978-3-030-15712-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Towards Content Expiry Date Determination: Predicting Validity Periods of Sentences

Abstract

Access this chapter

Similar content being viewed by others

Temporal Natural Language Inference: Evidence-Based Evaluation of Temporal Text Validity

Temporal validity reassessment: commonsense reasoning about information obsoleteness

Automatic Prediction of Future Business Conditions

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Towards Content Expiry Date Determination: Predicting Validity Periods of Sentences

Abstract

Access this chapter

Similar content being viewed by others

Temporal Natural Language Inference: Evidence-Based Evaluation of Temporal Text Validity

Temporal validity reassessment: commonsense reasoning about information obsoleteness

Automatic Prediction of Future Business Conditions

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation