Advertisement

Future Trends in Authorship Attribution

  • Patrick Juola
Part of the IFIP — The International Federation for Information Processing book series (IFIPAICT, volume 242)

Abstract

Authorship attribution, the science of inferring characteristics of an author from the characteristics of documents written by that author, is a problem with a long history and a wide range of application. This paper surveys the history and present state of the discipline — essentially a collection of ad hoc methods with little formal data available to select among them. It also makes some predictions about the needs of the discipline and discusses how these needs might be met.

Keywords

Authorship attribution stylometrics text forensics 

References

  1. [1]
    A. Abbasi and H. Chen, Identification and comparison of extremist-group web forum messages using authorship analysis, IEEE Intelligent Systems, vol. 20(5), pp. 67–75, 2005.CrossRefGoogle Scholar
  2. [2]
    A. Abbasi and H. Chen, Visualizing authorship for identification, in Proceedings of the IEEE International Conference on Intelligence and Security Informatics (LNCS 3975), S. Mehrotra, et al. (Eds.), Springer-Verlag, Berlin Heidelberg, Germany, pp. 60–71, 2006.Google Scholar
  3. [3]
    S. Argamon and S. Levitan, Measuring the usefulness of function words for authorship attribution, Proceedings of the Joint International Conference of the Association for Literary and Linguistic Computing and the Association for Computers and the Humanities, 2005.Google Scholar
  4. [4]
    R. Baayen, H. van Halteren, A. Neijt and F. Tweedie, An experiment in authorship attribution, Proceedings of JADT 2002: Sixth International Conference on Textual Data Statistical Analysis, pp. 29–37, 2002.Google Scholar
  5. [5]
    J. Binongo, Who wrote the 15th Book of Oz? An application of multivariate analysis to authorship attribution, Chance, vol. 16(2), pp. 9–17, 2003.MathSciNetGoogle Scholar
  6. [6]
    C. Brown, M. Covington, J. Semple and J. Brown, Reduced idea density in speech as an indicator of schizophrenia and ketamine in-toxication, presented at the International Congress on Schizophrenia Research, 2005.Google Scholar
  7. [7]
    J. Burrows, “an ocean where each kind…:” Statistical analysis and some major determinants of literary style, Computers and the Humanities, vol. 23(4–5), pp. 309–321, 1989.CrossRefGoogle Scholar
  8. [8]
    J. Burrows, Questions of authorships: Attribution and beyond, Computers and the Humanities, vol. 37(1), pp. 5–32, 2003.CrossRefGoogle Scholar
  9. [9]
    F. Can and J. Patton, Change of writing style with time, Computers and the Humanities, vol. 38(1), pp. 61–82, 2004.CrossRefGoogle Scholar
  10. [10]
    C. Chaski, Who’s at the keyboard: Authorship attribution in digital evidence investigations, International Journal of Digital Evidence, vol. 4(1), 2005.Google Scholar
  11. [11]
    C. Chaski, The keyboard dilemma and forensic authorship attribution, in Advances in Digital Forensics III, P. Craiger and S. Shenoi (Eds.), Springer, New York, pp. 133–146, 2007.Google Scholar
  12. [12]
    G. Easson, The linguistic implications of Shibboleths, presented at the Annual Meeting of the Canadian Linguistics Association, 2002.Google Scholar
  13. [13]
    R. Forsyth, Towards a text benchmark suite, Proceedings of the Joint International Conference of the Association for Literary and Linguistic Computing and the Association for Computers and the Humanities, 1997.Google Scholar
  14. [14]
    W. Friedman and E. Friedman., The Shakespearean Ciphers Examined, Cambridge University Press, Cambridge, United Kingdom, 1957.Google Scholar
  15. [15]
    D. Holmes, Authorship attribution, Computers and the Humanities, vol. 28(2), pp. 87–106, 1994.CrossRefGoogle Scholar
  16. [16]
    D. Holmes, Stylometry and the Civil War: The case of the Pickett Letters, Chance, vol. 16(2), pp. 18–26, 2003.MathSciNetGoogle Scholar
  17. [17]
    D. Holmes and R. Forsyth, The Federalist revisited: New directions in authorship attribution, Literary and Linguistic Computing, vol. 10(2), pp. 111–127, 1995.CrossRefGoogle Scholar
  18. [18]
    D. Hoover, Delta prime? Literary and Linguistic Computing, vol. 19(4), pp. 477–495, 2004.CrossRefGoogle Scholar
  19. [19]
    D. Hoover, Testing Burrows’ delta, Literary and Linguistic Computing, vol. 19(4), pp. 453–475, 2004.CrossRefGoogle Scholar
  20. [20] International Graphoanalysis Society (IGAS), http://www.igas.com.
  21. [21]
    E. Johnson, Lexical Change and Variation in the Southeastern United States 1930–1990, University of Alabama Press, Tuscaloosa, Alabama, 1996.Google Scholar
  22. [22]
    P. Juola, What can we do with small corpora? Document categorization via cross-entropy, Proceedings of the Interdisciplinary Workshop on Similarity and Categorization, 1997.Google Scholar
  23. [23]
    P. Juola, The rate of language change, Proceedings of the Fourth International Conference on Quantitative Linguistics, 2000.Google Scholar
  24. [24]
    P. Juola, Becoming Jack London, Proceedings of the Fifth International Conference on Quantitative Linguistics, 2003.Google Scholar
  25. [25]
    P. Juola, Ad-hoc authorship attribution competition, Proceedings of the Joint International Conference of the Association for Literary and Linguistic Computing and the Association for Computers and the Humanities, 2004.Google Scholar
  26. [26]
    P. Juola, On composership attribution, Proceedings of the Joint International Conference of the Association for Literary and Linguistic Computing and the Association for Computers and the Humanities, 2004.Google Scholar
  27. [27]
    P. Juola, Authorship attribution for electronic documents, in Advances in Digital Forensics II, M. Olivier and S. Shenoi (Eds.), Springer, New York, pp. 119–130, 2006.Google Scholar
  28. [28]
    P. Juola and H. Baayen, A controlled-corpus experiment in authorship attribution by cross-entropy, Literary and Linguistic Computing, vol. 20, pp. 59–67, 2005.CrossRefGoogle Scholar
  29. [29]
    P. Juola, J. Sofko and P. Brennan, A prototype for authorship attribution studies, Literary and Linguistic Computing, vol. 21(2), pp. 169–178, 2006.CrossRefGoogle Scholar
  30. [30]
    D. Kahn, The Codebreakers, Scribner, New York, 1996.Google Scholar
  31. [31]
    V. Kešelj and N. Cercone, CNG method with weighted voting, presented at the Joint International Conference of the Association for Literary and Linguistic Computing and the Association for Computers and the Humanities, 2004.Google Scholar
  32. [32]
    M. Koppel, S. Argamon and A. Shimoni, Automatically categorizing written texts by author gender, Literary and Linguistic Computing, vol. 17(4), pp. 401–412, 2002.CrossRefGoogle Scholar
  33. [33]
    C. Martindale and D. McKenzie, On the utility of content analysis in authorship attribution: The Federalist Papers, Computers and the Humanities, vol. 29(4), pp. 259–270, 1995.CrossRefGoogle Scholar
  34. [34]
    T. Mendenhall, The characteristic curves of composition, Science, vol. IX, pp. 237–249, 1887.CrossRefGoogle Scholar
  35. [35]
    F. Mosteller and D. Wallace, Inference and Disputed Authorship: The Federalist, Addison-Wesley, Reading, Massachusetts, 1964.zbMATHGoogle Scholar
  36. [36]
    M. Rockeach, R. Homant and L. Penner, A value analysis of the disputed Federalist Papers, Journal of Personality and Social Psychology, vol. 16, pp. 245–250, 1970.CrossRefGoogle Scholar
  37. [37]
    J. Rudman, The state of authorship attribution studies: Some problems and solutions, Computers and the Humanities, vol. 31, pp. 351–365, 1998.CrossRefGoogle Scholar
  38. [38]
    J. Rudman, The non-traditional case for the authorship of the twelve disputed Federalist Papers: A monument built on sand, Proceedings of the Joint International Conference of the Association for Literary and Linguistic Computing and the Association for Computers and the Humanities, 2005.Google Scholar
  39. [39]
    S. Stein and S. Argamon, A mathematical explanation of Burrows’ delta, Proceedings of the Digital Humanities Conference, 2006.Google Scholar
  40. [40]
    F. Tweedie, S. Singh and D. Holmes, Neural network applications in stylometry: The Federalist Papers, Computers and the Humanities, vol. 30(1), pp. 1–10, 1996.CrossRefGoogle Scholar
  41. [41]
    H. van Halteren, R. Baayen, F. Tweedie, M. Haverkort and A. Neijt, New machine learning methods demonstrate the existence of a human stylome, Journal of Quantitative Linguistics, vol. 12(1), pp. 65–77, 2005.CrossRefGoogle Scholar
  42. [42]
    F. Wellman, The Art of Cross-Examination, MacMillan, New York, 1936.Google Scholar
  43. [43]
    G. Yule, The Statistical Study of Literary Vocabulary, Cambridge University Press, Cambridge, United Kingdom, 1944.Google Scholar

Copyright information

© International Federation for Information Processing 2007

Authors and Affiliations

  • Patrick Juola
    • 1
  1. 1.Computer Science at Duquesne UniversityPittsburgh

Personalised recommendations