Using Time Series Analysis for Estimating the Time Stamp of a Text

Chiru, Costin-Gabriel; Toia, Madalina

doi:10.1007/978-3-319-55789-2_3

Costin-Gabriel Chiru⁴ &
Madalina Toia⁴

Part of the book series: Contributions to Statistics ((CONTRIB.STAT.))

Included in the following conference series:

International Work-Conference on Time Series Analysis

1947 Accesses
1 Altmetric

Abstract

Language is constantly changing, with words being created or disappearing over time. Moreover, the usage of different words tends to fluctuate due to influences from different fields, such as historical events, cultural movements or scientific discoveries. These changes are reflected in the written texts and thus, by tracking them, one can determine the moment when these texts were written. In this paper, we present an application based on time series analysis built on top of the Google Books N-gram corpus to determine the time stamp of different written texts. The application is using two heuristics: words’ fingerprinting, to find the time interval when they were most probable used, and words’ importance for the given text, to weight the influence of words’ fingerprinting for estimating the text time stamp. Combining these two heuristics allows time stamping of that text.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Jurafsky, D., Martin, J.: Speech and Language Processing. Prentice Hall (2000)
Google Scholar
Michel, J.-B., Shen, Y.K., Aiden, A.P., Veres, A., Gray, M.K., The Google Books Team, Pickett, J.P., Hoiberg, D., Clancy, D., Norvig, P., Orwant, J., Pinker, S., Nowak, M.A., Aiden, E.L.: Quantitative analysis of culture using millions of digitized books. Science 331(6014), 176–182 (2011)
Article Google Scholar
Fromkin, V., Robert, R., Hyams, N.: An Introduction to Language, 7th edn. Thomson Wadswor (2003)
Google Scholar
Wijaya, D.T., Yeniterzi, R.: Understanding semantic change of words over centuries. In: DETECT’11, pp. 35–40 (2011)
Google Scholar
Mitra, S., Mitra, R., Riedl, M., Biemann, C., Mukherjee, A., Goyal, P.: That’s sick dude!: automatic identification of word sense change across different timescales. In: 52nd ACL, pp. 1020–1029 (2014)
Google Scholar
Petersen, A.M., Tenenbaum, J., Havlin, S., Stanley, H.E.: Statistical laws governing fluctuations in word use from word birth to word death. Sci. Rep. 2, 313 (2012)
Article Google Scholar
Garcia-Fernandez, A., Ligozat, A.-L., Dinarelli, M., Bernhard, D.: When was it written? Automatically determining publication dates. In: String Processing and Information Retrieval, pp. 221–236 (2011)
Google Scholar
de Jong, F., Rode, H., Hiemstra, D.: Temporal language models for the disclosure of historical text. In: Proceedings of the AHC’05, pp. 161–168 (2005)
Google Scholar
Szymanski, T., Lynch, G.: UCD: diachronic text classification with character, word, and syntactic N-grams. In: SemEval 2015, 879–883 (2015)
Google Scholar
Zimmermann, R.: Dating hitherto undated old English texts based on text-internal criteria. http://www.old-engli.sh/my-research.php
Rubner, Y., Tomasi, C., Guibas, L. J.: A metric for distributions with applications to image databases. In: Computer Vision and Image Understanding, pp. 86–109 (2004)
Google Scholar
Bishop, C.M.: Pattern Recognition and Machine Learning. Springer, New York (2001)
MATH Google Scholar

Download references

Acknowledgements

This work has been funded by University Politehnica of Bucharest, through the “Excellence Research Grants” Program, UPB–GEX. Identifier: UPB–EXCELENȚĂ–2016 Aplicarea metodelor de învățare automată în analiza seriilor de timp (Applying machine learning techniques in time series analysis), Contract number 09/26.09.2016.

Author information

Authors and Affiliations

Department of Computer Science and Engineering, Politehnica University from Bucharest, 313 Splaiul Independetei, Bucharest, Romania
Costin-Gabriel Chiru & Madalina Toia

Authors

Costin-Gabriel Chiru
View author publications
You can also search for this author in PubMed Google Scholar
Madalina Toia
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Costin-Gabriel Chiru .

Editor information

Editors and Affiliations

CITIC-UGR, University of Granada, Granada, Spain
Ignacio Rojas
CITIC-UGR, University of Granada, Granada, Spain
Héctor Pomares
CITIC-UGR, University of Granada, Granada, Spain
Olga Valenzuela

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Chiru, CG., Toia, M. (2017). Using Time Series Analysis for Estimating the Time Stamp of a Text. In: Rojas, I., Pomares, H., Valenzuela, O. (eds) Advances in Time Series Analysis and Forecasting. ITISE 2016. Contributions to Statistics. Springer, Cham. https://doi.org/10.1007/978-3-319-55789-2_3

Download citation

DOI: https://doi.org/10.1007/978-3-319-55789-2_3
Published: 03 August 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-55788-5
Online ISBN: 978-3-319-55789-2
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics