Advertisement

Using Dependency-Based Annotations for Authorship Identification

  • Charles Hollingsworth
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7499)

Abstract

Most statistical approaches to stylometry to date have focused on lexical methods, such as relative word frequencies or type-token ratios. Explicit attention to syntactic features has been comparatively rare. Those approaches that have used syntactic features typically either used very shallow features (such as parts of speech) or features based on phrase structure grammars. This paper investigates whether typed dependency grammars might yield useful stylometric features.

An experiment was conducted using a novel method of depicting information about typed dependencies. Each token in a text is replaced with a “DepWord,” which consists of a concise representation of the chain of grammatical dependencies from that token back to the root of the sentence. The resulting representation contains only syntactic information, with no lexical or othographic information. These DepWords can then be used in place of the original words as the input for statistical language processing methods.

I adapted a simple method of authorship attribution — nearest neighbor based on word frequency rankings — for use with DepWords, and found it performed comparably to the same technique trained on words or parts of speech, even outperforming lexical methods in some cases. This indicates that the grammatical dependency relations between words contains stylometric information sufficient for distinguishing authorship. These results suggest that further research into typed-dependency-based stylometry might prove fruitful.

Keywords

stylometry authorship attribution syntax dependency grammar DepWords 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Baayen, R., van Halteren, H., Tweedie, F.: Outside the cave of shadows: Using syntactic annotation to enhance authorship attribution. Literary and Linguistic Computing 11(3), 121–131 (1996)CrossRefGoogle Scholar
  2. 2.
    Goldman, E., Allison, A.: Using grammatical Markov models for stylometric analysis. Class project, CS224N, Stanford University (2008), Retrieved from, http://nlp.stanford.edu/courses/cs224n/2008/reports/17.pdf
  3. 3.
    Holmes, D.I.: Authorship attribution. Computers and the Humanities 28(2), 87–106 (1994)CrossRefGoogle Scholar
  4. 4.
    Juola, P.: Authorship Attribution. Now Publishers, Delft (2008)Google Scholar
  5. 5.
    Kaster, A., Siersdorfer, S., Weikum, G.: Combining text and linguistic document representations for authorship attribution. In: SIGIR Workshop: Stylistic Analysis of Text for Information Access (STYLE), pp. 27–35. MPI, Saarbrücken (2005)Google Scholar
  6. 6.
    Levitsky, V., Melnyk, Y.P.: Sentence length and sentence structure in English prose. Glottometrics 21, 14–24 (2011)Google Scholar
  7. 7.
    Marneffe, M., MacCartney, B., Manning, C.D.: Generating typed dependency parses from phrase structure parses. In: Proceedings of the 5th International Conference on Language Resources and Evaluation, pp. 449–454 (2006)Google Scholar
  8. 8.
    Mosteller, F., Wallace, D.L.: Inference and disputed authorship: The Federalist. Addison-Wesley, Massachusetts (1964)zbMATHGoogle Scholar
  9. 9.
    Popescu, M., Dinu, L.P.: Rank distance as a stylistic similarity. In: Coling 2008: Companion Volume — Posters and Demonstrations, pp. 91–94 (2008)Google Scholar
  10. 10.
    Raghavan, S., Kovashka, A., Mooney, R.: Authorship attribution using probabilistic context-free grammars. In: Proceedings of the ACL 2010 Conference Short Papers, pp. 38–42 (2010)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Charles Hollingsworth
    • 1
    • 2
  1. 1.Institute for Artificial IntelligenceThe University of GeorgiaAthensGreece
  2. 2.Applied Systems IntelligenceAlpharettaUSA

Personalised recommendations