Breaking the Closed-World Assumption in Stylometric Authorship Attribution

  • Ariel Stolerman
  • Rebekah Overdorf
  • Sadia Afroz
  • Rachel Greenstadt
Conference paper

DOI: 10.1007/978-3-662-44952-3_13

Part of the IFIP Advances in Information and Communication Technology book series (IFIPAICT, volume 433)
Cite this paper as:
Stolerman A., Overdorf R., Afroz S., Greenstadt R. (2014) Breaking the Closed-World Assumption in Stylometric Authorship Attribution. In: Peterson G., Shenoi S. (eds) Advances in Digital Forensics X. DigitalForensics 2014. IFIP Advances in Information and Communication Technology, vol 433. Springer, Berlin, Heidelberg

Abstract

Stylometry is a form of authorship attribution that relies on the linguistic information found in a document. While there has been significant work in stylometry, most research focuses on the closed-world problem where the author of the document is in a known suspect set. For open-world problems where the author may not be in the suspect set, traditional classification methods are ineffective. This paper proposes the “classify-verify” method that augments classification with a binary verification step evaluated on stylometric datasets. This method, which can be generalized to any domain, significantly outperforms traditional classifiers in open-world settings and yields an F1-score of 0.87, comparable to traditional classifiers in closed-world settings. Moreover, the method successfully detects adversarial documents where authors deliberately change their styles, a problem for which closed-world classifiers fail.

Keywords

Forensic stylometry authorship attribution authorship verification 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Copyright information

© IFIP International Federation for Information Processing 2014

Authors and Affiliations

  • Ariel Stolerman
    • 1
  • Rebekah Overdorf
    • 1
  • Sadia Afroz
    • 2
  • Rachel Greenstadt
    • 1
  1. 1.Drexel UniversityPhiladelphiaUSA
  2. 2.Computer Science DivisionUniversity of California at BerkeleyBerkeleyUSA

Personalised recommendations