Advertisement

Computers and the Humanities

, Volume 29, Issue 4, pp 259–270 | Cite as

On the utility of content analysis in author attribution:The Federalist

  • Colin Martindale
  • Dean McKenzie
Article

Abstract

In studies of author attribution, measurement of differential use of function words is the most common procedure, though lexical statistics are often used. Content analysis has seldom been employed. We compare the success of lexical statistics, content analysis, and function words in classifying the 12 disputedFederalist papers. Of course, Mosteller and Wallace (1964) have presented overwhelming evidence that all 12 were by James Madison rather than by Alexander Hamilton. Our purpose is not to challenge these attributions but rather to useThe Federalist as a test case. We found lexical statistics to be of no use in classifying the disputed papers. Using both classical canonical discriminant analysis and a neural-network approach, content analytic measures — the Harvard III Psychosociological Dictionary and semantic differential indices — were found to be successful at attributing most of the disputed papers to Madison. However, a function-word approach is more successful. We argue that content analysis can be useful in cases where the function-word approach does not yield compelling conclusions and, perhaps, in preliminary screening in cases where there are a large number of possible authors.

Key words

author attribution content analysis discriminant analysis lexical statistics neural networks The Federalist 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Anderson, C. W. and G. E. McMaster. “Quantification of the Brothers Grimm: A Comparison of Successive Versions of Three Tales”.Computers and the Humanities, 23 (1989), 341–46.Google Scholar
  2. Caudill, M. and C. Butler.Naturally Intelligent Systems. Cambridge, MA: MIT Press, 1990.Google Scholar
  3. Damerau, F. J. “The Use of Function Word Frequencies as Indicators of Style”.Computers and the Humanities, 9 (1975), 271–80.Google Scholar
  4. Elliott, W. and R. Valenza. “Who Was Shakespeare?”Chance, 4 (1991a), 8–14.Google Scholar
  5. Elliott, W. and R. Valenza. “A Touchstone for the Bard”.Computers and the Humanities, 25 (1991b), 199–209.Google Scholar
  6. Fix, E. and J. L. Hodges.Discriminatory Analysis, Nonparametric Discrimination: Consistency Properties. Report 4. Randolph Field, TX: USAF School of Aviation Medicine, 1951.Google Scholar
  7. Forsyth, R. S. “Neural Learning Algorithms: Some Empirical Trials”. InProceedings of the Third Conference on Neural Nets and their Applications. Nanterre, France, 1990, pp. 301–17.Google Scholar
  8. Frautschi, R. L. “Lexical and Focal Preferences in Rousseau'sProfession de Foi du Vicaire Savoyard (Book IV ofEmile)”.Computers and the Humanities, 23 (1989), 347–55.Google Scholar
  9. Freud, S. “Psychopathology of Everyday Life”. InBasic Writings of Sigmund Freud. Ed. A. A. Brill. New York: Modern Library, 1938. (Original work published, 1904).Google Scholar
  10. Hand, D. J.Discrimination and Classification. New York: Wiley, 1981.Google Scholar
  11. Heise, D. R. “Semantic Differential Profiles for 1000 Most Frequent English Words”.Psychological Monographs, 79 (1965), 1–31.Google Scholar
  12. Holmes, D. I. “Authorship Attribution”.Computers and the Humanities, 28 (1994), 87–106.Google Scholar
  13. Horton, T. B.The Effectiveness of the Stylometry of Function Words in Discriminating Between Fletcher and Shakespeare. Unpublished Ph.D. dissertation. University of Edinburgh, 1987.Google Scholar
  14. Kruskal, J. B. “Multidimensional Scaling by Optimizing Goodness of Fit to a Nonmetric Hypothesis”.Psychometrika, 29 (1964), 1–28, 114–29.Google Scholar
  15. Martindale, C. “COUNT: A PL/I Program for Content Analysis of Natural Language”.Behavioral Science, 18 (1973), 1948.Google Scholar
  16. Martindale, C. “LEXSTAT: A PL/I Program for Computation of Lexical Statistics”.Behavior Research Methods and Instrumentation, 6 (1974), 571.Google Scholar
  17. Martindale, C.The Clockwork Muse: The Predictability of Artistic Change. New York: Basic Books, 1990.Google Scholar
  18. Martindale, C.Cognitive Psychology: A Neural-network Approach. Pacific Grove, CA: Brooks/Cole, 1991.Google Scholar
  19. Matthews, R. A. J. and T. V. N. Merriam. “Neural Computation in Stylometry I: An application to the Works of Shakespeare and Fletcher”.Literary and Linguistic Computing, 4 (1993) 203–209.Google Scholar
  20. McKenzie, D. P. and R. S. Forsyth. “Classification by Similarity: An Overview of Statistical Methods of Case-Based Reasoning”.Computers in Human Behavior (in press).Google Scholar
  21. Mendenhall, T. C. “A Mechanical Solution of a Literary Problem”.Popular Science, 60 (1901), 97–105.Google Scholar
  22. Merriam, T. V. N.Modelling a Canon: A Stylometric Examination of Shakespeare's First Folio. Unpublished Ph.D. dissertation. University of London, 1992.Google Scholar
  23. Merriam, T. V. N. “Marlowe's Hand in Edward III”.Literary and Linguistic Computing, 8 (1993), 59–72.Google Scholar
  24. Mosteller, F. and D. L. Wallace.Inference and Disputed Authorship: The Federalist. Reading, MA: Addison-Wesley, 1964.Google Scholar
  25. Mosteller, F. and D. L. Wallace.Applied Bayesian and Classical Inference: The Case of the Federalist Papers. 2nd. Ed. New York: Springer-Verlag, 1984.Google Scholar
  26. Osgood, C. E., G. Suci and P. H. Taunenbaum.The Measurement of Meaning. Urbana, IL: University of Illinois Press, 1957.Google Scholar
  27. Rokeach, M., R. Homant and L. Penner. “A Value Analysis of the Disputed Federalist Papers”.Journal of Personality and Social Psychology, 16 (1970), 245–50.Google Scholar
  28. SAS Institute, Inc.SAS User's Guide: Statistics. Cary, NC: SAS Institute, 1985.Google Scholar
  29. Siegel, S. and N. J. Castellan.Nonparametric Statistics for the Behavioral Sciences. New York: McGraw-Hill, 1988.Google Scholar
  30. Sigelman, L. and W. Jacoby. “The Not-So-Simple Art of Imitation: Pastiche, Literary Style, and Raymond Chandler”.Computers and the Humanities (in press).Google Scholar
  31. Specht, D. F. “Probabilistic Neural Networks”.Neural Networks, 3 (1990), 109–18.Google Scholar
  32. Specht, D. F. and P. D. Shapiro. “Generalized Accuracy of Probabilistic Neural Networks Compared with Back-Propagation Networks”. InProceedings of the International Joint Conference on Neural Networks. Seattle, WA., 1991, pp. 887–92.Google Scholar
  33. Spence, D. P., H. S. Scarborough and E. H. Ginsberg. “Lexical Correlates of Cervical Cancer”.Social Science and Medicine, 12 (1978), 141–44.Google Scholar
  34. SPSS, Inc.SPSS Reference Guide. Chicago: SPSS, Inc., 1990.Google Scholar
  35. Stone, P. J. et al.The General Inquirer: A Computer Aproach to Content Aalysis. Cambridge, MA: MIT Press, 1966.Google Scholar
  36. Ward Systems Group.NeuroWindows: Neural Network Dynamic Link Library. Frederick, MD: Ward Systems Group, 1992.Google Scholar
  37. Williams, C. B. “A Note on the Statistical Analysis of Sentence-Length as a Criterion of Literary Style”.Biometrika, 31 (1939), 356–61.Google Scholar
  38. Yule, G. U. “On Sentence-Length as a Statistical Characteristic of Style in Prose: With Application to Two Cases of Disputed Authorship”.Biometrika, 30 (1938), 363–90.Google Scholar
  39. Yule, G. U.The Statistical Study of Literary Vocabulary. Cambridge: Cambridge University Press, 1944.Google Scholar

Copyright information

© Kluwer Academic Publishers 1995

Authors and Affiliations

  • Colin Martindale
    • 1
  • Dean McKenzie
    • 2
  1. 1.Department of PsychologyUniversity of MaineOronoUSA
  2. 2.Psychological MedecineMonash UniversityMelbourneAustralia

Personalised recommendations