Skip to main content

Using Time Series Analysis for Estimating the Time Stamp of a Text

  • Conference paper
  • First Online:
Advances in Time Series Analysis and Forecasting (ITISE 2016)

Part of the book series: Contributions to Statistics ((CONTRIB.STAT.))

Included in the following conference series:

Abstract

Language is constantly changing, with words being created or disappearing over time. Moreover, the usage of different words tends to fluctuate due to influences from different fields, such as historical events, cultural movements or scientific discoveries. These changes are reflected in the written texts and thus, by tracking them, one can determine the moment when these texts were written. In this paper, we present an application based on time series analysis built on top of the Google Books N-gram corpus to determine the time stamp of different written texts. The application is using two heuristics: words’ fingerprinting, to find the time interval when they were most probable used, and words’ importance for the given text, to weight the influence of words’ fingerprinting for estimating the text time stamp. Combining these two heuristics allows time stamping of that text.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Jurafsky, D., Martin, J.: Speech and Language Processing. Prentice Hall (2000)

    Google Scholar 

  2. Michel, J.-B., Shen, Y.K., Aiden, A.P., Veres, A., Gray, M.K., The Google Books Team, Pickett, J.P., Hoiberg, D., Clancy, D., Norvig, P., Orwant, J., Pinker, S., Nowak, M.A., Aiden, E.L.: Quantitative analysis of culture using millions of digitized books. Science 331(6014), 176–182 (2011)

    Article  Google Scholar 

  3. Fromkin, V., Robert, R., Hyams, N.: An Introduction to Language, 7th edn. Thomson Wadswor (2003)

    Google Scholar 

  4. Wijaya, D.T., Yeniterzi, R.: Understanding semantic change of words over centuries. In: DETECT’11, pp. 35–40 (2011)

    Google Scholar 

  5. Mitra, S., Mitra, R., Riedl, M., Biemann, C., Mukherjee, A., Goyal, P.: That’s sick dude!: automatic identification of word sense change across different timescales. In: 52nd ACL, pp. 1020–1029 (2014)

    Google Scholar 

  6. Petersen, A.M., Tenenbaum, J., Havlin, S., Stanley, H.E.: Statistical laws governing fluctuations in word use from word birth to word death. Sci. Rep. 2, 313 (2012)

    Article  Google Scholar 

  7. Garcia-Fernandez, A., Ligozat, A.-L., Dinarelli, M., Bernhard, D.: When was it written? Automatically determining publication dates. In: String Processing and Information Retrieval, pp. 221–236 (2011)

    Google Scholar 

  8. de Jong, F., Rode, H., Hiemstra, D.: Temporal language models for the disclosure of historical text. In: Proceedings of the AHC’05, pp. 161–168 (2005)

    Google Scholar 

  9. Szymanski, T., Lynch, G.: UCD: diachronic text classification with character, word, and syntactic N-grams. In: SemEval 2015, 879–883 (2015)

    Google Scholar 

  10. Zimmermann, R.: Dating hitherto undated old English texts based on text-internal criteria. http://www.old-engli.sh/my-research.php

  11. Rubner, Y., Tomasi, C., Guibas, L. J.: A metric for distributions with applications to image databases. In: Computer Vision and Image Understanding, pp. 86–109 (2004)

    Google Scholar 

  12. Bishop, C.M.: Pattern Recognition and Machine Learning. Springer, New York (2001)

    MATH  Google Scholar 

Download references

Acknowledgements

This work has been funded by University Politehnica of Bucharest, through the “Excellence Research Grants” Program, UPB–GEX. Identifier: UPB–EXCELENȚĂ–2016 Aplicarea metodelor de învățare automată în analiza seriilor de timp (Applying machine learning techniques in time series analysis), Contract number 09/26.09.2016.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Costin-Gabriel Chiru .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Chiru, CG., Toia, M. (2017). Using Time Series Analysis for Estimating the Time Stamp of a Text. In: Rojas, I., Pomares, H., Valenzuela, O. (eds) Advances in Time Series Analysis and Forecasting. ITISE 2016. Contributions to Statistics. Springer, Cham. https://doi.org/10.1007/978-3-319-55789-2_3

Download citation

Publish with us

Policies and ethics