Computers and the Humanities

, Volume 35, Issue 2, pp 193–214

Computer-Based Authorship Attribution Without Lexical Measures


  • E. Stamatatos
    • Dept. of Electrical and Computer EngineeringUniversity of Patras
  • N. Fakotakis
    • Dept. of Electrical and Computer EngineeringUniversity of Patras
  • G. Kokkinakis
    • Dept. of Electrical and Computer EngineeringUniversity of Patras

DOI: 10.1023/A:1002681919510

Cite this article as:
Stamatatos, E., Fakotakis, N. & Kokkinakis, G. Computers and the Humanities (2001) 35: 193. doi:10.1023/A:1002681919510


The most important approaches to computer-assistedauthorship attribution are exclusively based onlexical measures that either represent the vocabularyrichness of the author or simply comprise frequenciesof occurrence of common words. In this paper wepresent a fully-automated approach to theidentification of the authorship of unrestricted textthat excludes any lexical measure. Instead we adapt aset of style markers to the analysis of the textperformed by an already existing natural languageprocessing tool using three stylometric levels, i.e.,token-level, phrase-level, and analysis-levelmeasures. The latter represent the way in which thetext has been analyzed. The presented experiments ona Modern Greek newspaper corpus show that the proposedset of style markers is able to distinguish reliablythe authors of a randomly-chosen group and performsbetter than a lexically-based approach. However, thecombination of these two approaches provides the mostaccurate solution (i.e., 87% accuracy). Moreover, wedescribe experiments on various sizes of the trainingdata as well as tests dealing with the significance ofthe proposed set of style markers.

Copyright information

© Kluwer Academic Publishers 2001