Computers and the Humanities

, Volume 28, Issue 2, pp 87–106 | Cite as

Authorship attribution

  • David I. Holmes
Article

Abstract

This paper considers the problem of quantifying literary style and looks at several variables which may be used as stylistic “fingerprints” of a writer. A review of work done on the statistical analysis of “change over time” in literary style is then presented, followed by a look at a specific application area, the authorship of Biblical texts.

Key Words

stylometry authorship vocabulary model multivariate 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Antosch, F. “The Diagnosis of Literary Style with the Verb-Adjective Ratio.” InStatistics and Style. Eds. L. Dolezel and R.W. Bailey. New York: American Elsevier, 1969.Google Scholar
  2. Bailey, R.W. “Authorship Attribution in a Forensic Setting.”Advances in Computer-aided Literary and Linguistic Research. Eds. D.E. Ager, F.E. Knowles and J. Smith. Birmingham: AMLC, 1979.Google Scholar
  3. Baker, J.C. Pace. “A Test of Authorship Based on the Rate at Which New Words Enter an Author's Text.”Journal of the Association for Literary and Linguistic Computing, 3, 1 (1988), 36–39.Google Scholar
  4. Bartholomew, D.J. “Probability, Statistics and Theology.”Journal of the Royal Statistical Society, A, 151, 1 (1988), 137–78.Google Scholar
  5. Bee, R.E. “Statistical Methods in the Study of the Masoretic Text of the Old Testament.”Journal of the Royal Statistical Society, A, 134, 4 (1971), 611–622.Google Scholar
  6. Bee, R.E. “A Statistical Study of the Sinai Periscope.”Journal of the Royal Statistical Society, A, 135, 3 (1972), 406–421.Google Scholar
  7. Bender, T.K. and S.M. Briggum. “Quantitative Stylistic Analysis of Impressionist Style in Joseph Conrad and Ford Maddox Ford.” InComputing in the Humanities. Ed. R.W. Bailey. North-Holland, 1982.Google Scholar
  8. Bennett, P.E. “The Statistical Measurement of a Stylistic Trait inJulius Caesar andAs You Like It.” InStatistics and Style. Eds. L. Dolezel and R.W. Bailey. New York: American Elsevier, 1969.Google Scholar
  9. Boreland, H. and P. Galloway. “Authorship, Discrimination and Clustering: Timoneda, Montesino and Two Anonymous Poems.”Association for Literary and Linguistic Computing Bulletin, 8 (1980), 125–151.Google Scholar
  10. Brainerd, B. “On the Distinction Between a Novel and a Romance: A Discriminant Analysis.”Computers and the Humanities, 7 (1973), 259–270.Google Scholar
  11. Brainerd, B.Weighing Evidence in Language and Literature: A Statistical Approach. University of Toronto Press, 1974.Google Scholar
  12. Brainerd, B. “Two Models for the Type-Token Relation with Time Dependant Vocabulary Reservoir.” InVocabulary Structure and Lexical Richness. Eds. P. Thoiron, D. Serant and D. Labbe. Paris: Champion-Slatkine, 1988.Google Scholar
  13. Brinegar, C.S. “Mark Twain and the Quintus Curtius Snodgrass Letters: A Statistical Test of Authorship.”Journal of the American Statistical Association, 58 (1963), 85–96.Google Scholar
  14. Bruno, A.M.Toward a Quantitative Methodology for Stylistic Analyses. University of California Press, 1974.Google Scholar
  15. Burrows, J.F. “Word Patterns and Story Shapes: The Statistical Analysis of Narrative Style.”Journal of the Association for Literary and Linguistic Computing, 2, 2 (1987), 61–70.Google Scholar
  16. Burrows, J.F. and A.J. Hassall. “Anna Boleyn and the Authenticity of Fielding's Feminine Narratives.”Eighteenth Century Studies, 21 (1988), 427–453.Google Scholar
  17. Burrows, J.F. “Computers and the Study of Literature.” InComputers and Written Texts. Ed. C.S. Butler. Oxford: Blackwell, 1992.Google Scholar
  18. Cox, D.R. and L. Brandwood. “On a Discriminating Problem Connected with the Works of Plato.”Journal of the Royal Statistical Society, B, 21 (1959), 195–200.Google Scholar
  19. Damerau, F.J. “The Use of Function Word Frequencies as Indicators of Style.”Computers and the Humanities, 9 (1975), 271–280.Google Scholar
  20. Delcourt, C. “On Vocabulary Curves.”Association for Literary and Linguistic Computing Journal, 2 (1981), 13–24.Google Scholar
  21. Ellegard, A.A Statistical Method for Determining Authorship: The Junius Letters, 1769–1772. Gothenburg: University of Gothenburg, 1962.Google Scholar
  22. Fucks, W. “On the Mathematical Analysis of Style.”Biometrika, 39 (1952), 122–129.Google Scholar
  23. Fucks, W. and J. Lauter. “Mathematische Analyse des Literarischen Stils.” InMathematik und Dichtung. Eds. H. Kreuzer and R. Gunzenhausers. Munich: Nymphenburger Verlagsbuckhandlung, 1965.Google Scholar
  24. Grayston, K. and G. Herdan. “The Authorship of the Pastorals in the Light of Statistical Linguistics.”New Testament Studies, 6 (1959), 1–15.Google Scholar
  25. Gregory, M.J. “An Approach to the Study of Style.”Linguistics and Style. Eds. N. Enkvist, J. Spencer and M.J. Gregory. University of Oxford Press, 1964.Google Scholar
  26. Herdan, G. “A New Derivation and Interpretation of Yule's ‘Characteristic’ K.”Journal of Applied Mathematics and Physics, 6 (1955), 332–334.Google Scholar
  27. Herdan, G.Quantitative Linguistics. London: Butterworths, 1964.Google Scholar
  28. Herdan, G.The Advanced Theory of Language as Choice and Chance. New York: Springer-Verlag, 1966.Google Scholar
  29. Holmes, D.I. “Vocabulary Richness and the Prophetic Voice.”Literary and Linguistic Computing, 6, 4 (1991), 259–268.Google Scholar
  30. Holmes, D.I. “A Stylometric Analysis of Mormon Scripture and Related Texts.”Journal of the Royal Statistical Society (A), 155, 1 (1992), 91–120.Google Scholar
  31. Honoré, A. “Some Simple Measures of Richness of Vocabulary.”Association for Literary and Linguistic Computing Bulletin, 7, 2 (1979), 172–177.Google Scholar
  32. Hubert, P. and D. Labbe, D. “A Model of Vocabulary Partition.”Journal of the Association for Literary and Linguistic Computing, 3, 4 (1988), 223–225.Google Scholar
  33. Johnson, R. “Measures of Vocabulary Diversity.” InAdvances in Computer-aided Literary and Linguistic Research. Eds. D.E. Ager, F.E. Knowles and M.W.A. Smith. Birmingham: AMLC, 1979.Google Scholar
  34. Kemp, K.W. “Aspects of the Statistical Analysis and Effective Use of Linguistic Data.”Association for Literary and Linguistic Computing Bulletin, 4 (1976), 14–22.Google Scholar
  35. Kenny, A.A Stylometric Study of the New Testament. Oxford University Press, 1986.Google Scholar
  36. Kjetssa, G. “And Quiet Flows the Don Through the Computer.”Association for Literary and Linguistic Computing Bulletin, 7 (1979), 248–256.Google Scholar
  37. Kjetssa, G. “Written by Dostoyevsky.”Association for Literary and Linguistic Computing Journal, 2 (1981), 25–33.Google Scholar
  38. Ledger, G.R.Re-counting Plato: A Computer Analysis of Plato's Style. Oxford: Clarendon, 1989.Google Scholar
  39. Mandelbrot, B. “A Note on a Class of Skew Distribution Functions: Analysis and Critique of a Paper by H.A. Simon.”Information and Control, 2 (1959), 90–99.Google Scholar
  40. Mendenhall, T.C. “The Characteristic Curves of Composition.”Science, IX (1887), 237–249.Google Scholar
  41. Miles, J. and H. C. Selvin. “A Factor Analysis of the Vocabulary of Poetry in the Seventeenth Century.” InThe Computer and Literary Style. Ed. J. Leed. Ohio: Kent State University Press, 1966.Google Scholar
  42. Morton, A.Q. “The Authorship of Greek Prose.”Journal of the Royal Statistical Society, A, 128 (1965), 169–233.Google Scholar
  43. Morton, A.Q.Literary Detection. New York: Scribners, 1978.Google Scholar
  44. Morton, A.Q. “Once. A Test of Authorship Based on Words which are not Repeated in the Sample.”Journal of the Association for Literary and Linguistic Computing, 1, 1 (1986), 1–8.Google Scholar
  45. Morton, A.Q. and J. McLeman.The Genesis of John. Edinburgh: St Andrew's Press, 1980.Google Scholar
  46. Mosteller, F. and D.L. Wallace. “Inference and Disputed Authorship: TheFederalist.” Reading, MA: Addison-Wesley, 1964.Google Scholar
  47. Muller, C. “Calcul des Probabilités et Calcul d'un Vocabulaire.”Travaux de Linguistique et de Littérature (1964), 235–244.Google Scholar
  48. Muller, C. “Lexical Distribution Reconsidered: the Waring-Herdan Formula.” InStatistics and Style. Eds. L. Dolezel and R.W. Bailey, New York: American Elsevier, 1969.Google Scholar
  49. Muller, C. “Peut-on estimer l'étendue d'un lexique?”Cahiers de Lexicologie, 27 (1975), 3–29.Google Scholar
  50. Oakman, R.L.Computer Methods for Literary Research. Columbia: University of South Carolina Press, 1980.Google Scholar
  51. Pollatschek, M. and Y.T. Radday. “Vocabulary Richness and Concentration in Hebrew Biblical Literature.”Association for Literary and Linguistic Computing Bulletin, 8 (1981), 217–231.Google Scholar
  52. Pollatschek, M. and Y.T. Radday. “Vocabulary Richness and Concentration.” InGenesis: An Authorship Study. Eds. Y.T. Radday and H. Shore. Rome: Biblical Institute Press, 1985.Google Scholar
  53. Portnoy, S. “Reply to Professor Bartholomew.”Journal of the Royal Statistical Society, A, 151, 1 (1988), 172.Google Scholar
  54. Portnoy, S. and D.L. Petersen. “Biblical Texts and Statistical analysis: Zechariah and Beyond.”Journal of Biblical Literature, 103 (1984), 11–21.Google Scholar
  55. Radday, Y.T.The Unity of Isaiah in the Light of Statistical Linguistics. Gerstenberg: Hindlesheim, 1973.Google Scholar
  56. Radday, Y.T. and D. Wickmann. “The Unity of Zechariah in the Light of Statistical Linguistics.”Zeit Alttestamentliche Wissenschaft, 87 (1975), 30–55.Google Scholar
  57. Radday, Y.T. and M. Pollatschek. “Frequency Profiles: A Key to the M. Pollatschek Structure of Lamentations.”Balsanut Hofsit, 12 (1977), 24–35.Google Scholar
  58. Radday, Y.T., D. Wickmann, G. Leb, and S. Talman. “The Book of Judges Examined by Statistical Linguistics.”Biblica, 58 (1977), 469–499.Google Scholar
  59. Radday, Y.T. and H. Shore.Genesis: An Authorship Study in Computer-assisted Statistical Linguistics. Rome: Biblical Institute Press, 1985.Google Scholar
  60. Ratkowsky, D.A. and L. Hantrais. “Tables for Comparing the Richness and Structure of Vocabulary in Texts of Different Lengths.”Computers and the Humanities, 9 (1975), 69–75.Google Scholar
  61. Sichel, H.S. “On a Distribution Representing Sentence-Length in Written Prose.”Journal of the Royal Statistical Society (A), 137 (1974), 25–34.Google Scholar
  62. Sichel, H.S. “On a Distribution Law for Word Frequencies.”Journal of the American Statistical Association, 70 (1975), 542–547.Google Scholar
  63. Sichel, H.S. “Word Frequency Distributions and Type-Token Characteristics.”Mathematical Scientist, 11 (1986), 45–72.Google Scholar
  64. Simpson, E.H. “Measurement of Diversity.”Nature, 163 (1949), 688.Google Scholar
  65. Smith, M.W.A. “Recent Experience and New Developments of Methods for the Determination of Authorship.”Association for Literary and Linguistic Computing Bulletin, 11 (1983), 73–82.Google Scholar
  66. Smith, M.W.A. “An Investigation of the Basis of Morton's Method for the Determination of Authorship.”Style, 19, 3 (1985a), 341–368.Google Scholar
  67. Smith, M.W.A. “An Investigation of Morton's Method to Distinguish Elizabethan Playwrights.”Computers and the Humanities, 19, 1 (1985b), 3–21.Google Scholar
  68. Smith, M.W.A. “Hapax Legomena in Prescribed Positions: An Investigation of Recent Proposals to Resolve Problems of Authorship.”Journal of the Association for Literary and Linguistic Computing, 2, 3 (1987a), 145–152.Google Scholar
  69. Smith, M.W.A. “The Authorship of Pericles: New Evidence for Wilkins.”Journal of the Association for Literary and Linguistic Computing, 2, 4 (1987b), 221–30.Google Scholar
  70. Smith, M.W.A. “Attribution by Statistics: A Critique of Four Recent Studies.”Revue, Informatique et Statistique dans les Sciences Humaines, 26 (1990), 233–251.Google Scholar
  71. Smith, M.W.A. “The Authorship ofThe Raigne of King Edward the Third.”Literary and Linguistic Computing, 6, 3 (1991a), 166–174.Google Scholar
  72. Smith, M.W.A. “The Authorship ofThe Revenger's Tragedy.”Notes and Queries, 38, 4 (1991 b), 508–513.Google Scholar
  73. Somers, H.H. “Statistical Methods in Literary Analysis.” InThe Computer and Literary Style. Ed. J. Leed, Ohio: Kent State University Press, 1966.Google Scholar
  74. Tallentire, D.R.An Appraisal of Methods and Models in Computational Stylistics, with Particular Reference to Author Attribution. PhD thesis. University of Cambridge, 1972.Google Scholar
  75. Tallentire, D.R. “Towards an Archive of Lexical Norms — A Proposal.” InThe Computer and Literary Studies. Eds. A.J. Aitken, R.W. Bailey and N. Hamilton-Smith. Edinburgh University Press, 1973.Google Scholar
  76. Tallentire, D.R. “Confirming Intuitions about Style Using Concordances.” InThe Computer in Literary and Linguistic Studies. Eds. A. Jones and R.F. Churchouse. University of Wales Press, 1976.Google Scholar
  77. Thoiron, P. “Diversity Index and Entropy as Measures of Lexical Richness.”Computers and the Humanities, 20, 3 (1986), 197–202.Google Scholar
  78. Ule, L. “Recent Progress in Computer Methods of Authorship Determination.”Association for Literary and Linguistic Computing Bulletin, 10 (1982), 73–89.Google Scholar
  79. Wake, W.C. “Sentence-Length Distributions of Greek Authors.”Journal of the Royal Statistical Society, A, 120 (1957), 331–346.Google Scholar
  80. Weitzman, M.P. “Reply to Professor Bartholomew.”Journal of the Royal Statistical Society, A, 151, 1 (1988) 173Google Scholar
  81. Williams, C.B. “A Note on the Statistical Analysis of Sentence-Length as a Criterion of Literary Style.”Biometrika, 31 (1940), 356–361.Google Scholar
  82. Williams, C.B.Style and Vocabulary: Numerical Studies. Griffin, 1970.Google Scholar
  83. Yule, G.U. “On Sentence-Length as a Statistical Characteristic of Style in Prose, with Application to Two Cases of Disputed Authorship.”Biometrika, 30 (1938), 363–390.Google Scholar
  84. Yule, G.U.The Statistical Study of Literary Vocabulary. Cambridge University Press, 1944.Google Scholar
  85. Zipf, G.K.Selected Studies of the Principle of Relative Frequency in Language. Cambridge, MA: Harvard University Press, 1932.Google Scholar

Copyright information

© Kluwer Academic Publishers 1994

Authors and Affiliations

  • David I. Holmes
    • 1
  1. 1.Department of Mathematical SciencesUniversity of the West of EnglandBristolUK

Personalised recommendations