When was Macbeth Written? Mapping Book to Time

  • Aminul IslamEmail author
  • Jie Mei
  • Evangelos E. Milios
  • Vlado Kešelj
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9041)


We address the question of predicting the time when a book was written using the Google Books Ngram corpus. This prediction could be useful for authorship and plagiarism detection, identification of literary movements, and forensic document examination. We propose an unsupervised approach and compare this with four baseline measures on a dataset consisting of 36 books written between 1551 and 1969. The proposed approach could be applicable to other languages as long as corpora of those languages similar to the Google Books Ngram are available.


Baseline Measure Prediction Quality Computational Linguistics Unique Word Evaluation Dataset 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Akiva, N.: Authorship and plagiarism detection using binary bow features. In: Forner, P., Karlgren, J., Womser-Hacker, C. (eds.) CLEF (Online Working Notes/Labs/Workshop) (2012)Google Scholar
  2. 2.
    Amancio, D.R., Oliveira, O.N., da Fontoura Costa, L.: Identification of literary movements using complex networks to represent texts. New Journal of Physics 14, 043029 (2012)Google Scholar
  3. 3.
    A simplified guide to forensic document examination (2013), (accessed: February 7, 2015)
  4. 4.
    Michel, J.B., Shen, Y.K., Aiden, A.P., Veres, A., Gray, M.K., Team, T.G.B., Pickett, J.P., Hoiberg, D., Clancy, D., Norvig, P., Orwant, J., Pinker, S., Nowak, M.A., Aiden, E.L.: Quantitative analysis of culture using millions of digitized books. Science 331, 176–182 (2011)CrossRefGoogle Scholar
  5. 5.
    Lin, Y., Michel, J.B., Aiden, E.L., Orwant, J., Brockman, W., Petrov, S.: Syntactic annotations for the Google books ngram corpus. In: Proceedings of the ACL 2012 System Demonstrations, ACL 2012, pp. 169–174. Association for Computational Linguistics, Stroudsburg (2012)Google Scholar
  6. 6.
    Barufaldi, B., Santana, E., Filho, J., van der Poel, J., Marques, M., Batista, L.: Text classification by literary period using ppm-c data compression. In: 2009 Seventh Brazilian Symposium in Information and Human Language Technology (STIL), pp. 125–133 (2009)Google Scholar
  7. 7.
    Kim, S., Kim, H., Weninger, T., Han, J.: Authorship classification: A syntactic tree mining approach. In: Proceedings of the ACM SIGKDD Workshop on Useful Patterns, UP 2010, pp. 65–73. ACM, New York (2010)CrossRefGoogle Scholar
  8. 8.
    Kessler, B., Numberg, G., Schütze, H.: Automatic detection of text genre. In: Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics, ACL 1998, pp. 32–38. Association for Computational Linguistics, Stroudsburg (1997)Google Scholar
  9. 9.
    Thisted, R., Efron, B.: Did Shakespeare write a newly-discovered poem? Biometrika 74, 445–455 (1987)CrossRefzbMATHMathSciNetGoogle Scholar
  10. 10.
    Thompson, J.R., Rasp, J.: Did C. S. Lewis write The Dark Tower?: An examination of the small-sample properties of the Thisted-Efron tests of authorship. Austrian Journal of Statistics 38, 71–82 (2009)Google Scholar
  11. 11.
    Brants, T., Franz, A.: Web 1T 5-gram corpus version 1.1. Technical report, Google Research (2006)Google Scholar
  12. 12. (accessed: January 15, 2015)
  13. 13. (accessed: January 15, 2015)

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  • Aminul Islam
    • 1
    Email author
  • Jie Mei
    • 1
  • Evangelos E. Milios
    • 1
  • Vlado Kešelj
    • 1
  1. 1.Faculty of Computer ScienceDalhousie UniversityHalifaxCanada

Personalised recommendations