Abstract
In studies of author attribution, measurement of differential use of function words is the most common procedure, though lexical statistics are often used. Content analysis has seldom been employed. We compare the success of lexical statistics, content analysis, and function words in classifying the 12 disputedFederalist papers. Of course, Mosteller and Wallace (1964) have presented overwhelming evidence that all 12 were by James Madison rather than by Alexander Hamilton. Our purpose is not to challenge these attributions but rather to useThe Federalist as a test case. We found lexical statistics to be of no use in classifying the disputed papers. Using both classical canonical discriminant analysis and a neural-network approach, content analytic measures — the Harvard III Psychosociological Dictionary and semantic differential indices — were found to be successful at attributing most of the disputed papers to Madison. However, a function-word approach is more successful. We argue that content analysis can be useful in cases where the function-word approach does not yield compelling conclusions and, perhaps, in preliminary screening in cases where there are a large number of possible authors.
Similar content being viewed by others
References
Anderson, C. W. and G. E. McMaster. “Quantification of the Brothers Grimm: A Comparison of Successive Versions of Three Tales”.Computers and the Humanities, 23 (1989), 341–46.
Caudill, M. and C. Butler.Naturally Intelligent Systems. Cambridge, MA: MIT Press, 1990.
Damerau, F. J. “The Use of Function Word Frequencies as Indicators of Style”.Computers and the Humanities, 9 (1975), 271–80.
Elliott, W. and R. Valenza. “Who Was Shakespeare?”Chance, 4 (1991a), 8–14.
Elliott, W. and R. Valenza. “A Touchstone for the Bard”.Computers and the Humanities, 25 (1991b), 199–209.
Fix, E. and J. L. Hodges.Discriminatory Analysis, Nonparametric Discrimination: Consistency Properties. Report 4. Randolph Field, TX: USAF School of Aviation Medicine, 1951.
Forsyth, R. S. “Neural Learning Algorithms: Some Empirical Trials”. InProceedings of the Third Conference on Neural Nets and their Applications. Nanterre, France, 1990, pp. 301–17.
Frautschi, R. L. “Lexical and Focal Preferences in Rousseau'sProfession de Foi du Vicaire Savoyard (Book IV ofEmile)”.Computers and the Humanities, 23 (1989), 347–55.
Freud, S. “Psychopathology of Everyday Life”. InBasic Writings of Sigmund Freud. Ed. A. A. Brill. New York: Modern Library, 1938. (Original work published, 1904).
Hand, D. J.Discrimination and Classification. New York: Wiley, 1981.
Heise, D. R. “Semantic Differential Profiles for 1000 Most Frequent English Words”.Psychological Monographs, 79 (1965), 1–31.
Holmes, D. I. “Authorship Attribution”.Computers and the Humanities, 28 (1994), 87–106.
Horton, T. B.The Effectiveness of the Stylometry of Function Words in Discriminating Between Fletcher and Shakespeare. Unpublished Ph.D. dissertation. University of Edinburgh, 1987.
Kruskal, J. B. “Multidimensional Scaling by Optimizing Goodness of Fit to a Nonmetric Hypothesis”.Psychometrika, 29 (1964), 1–28, 114–29.
Martindale, C. “COUNT: A PL/I Program for Content Analysis of Natural Language”.Behavioral Science, 18 (1973), 1948.
Martindale, C. “LEXSTAT: A PL/I Program for Computation of Lexical Statistics”.Behavior Research Methods and Instrumentation, 6 (1974), 571.
Martindale, C.The Clockwork Muse: The Predictability of Artistic Change. New York: Basic Books, 1990.
Martindale, C.Cognitive Psychology: A Neural-network Approach. Pacific Grove, CA: Brooks/Cole, 1991.
Matthews, R. A. J. and T. V. N. Merriam. “Neural Computation in Stylometry I: An application to the Works of Shakespeare and Fletcher”.Literary and Linguistic Computing, 4 (1993) 203–209.
McKenzie, D. P. and R. S. Forsyth. “Classification by Similarity: An Overview of Statistical Methods of Case-Based Reasoning”.Computers in Human Behavior (in press).
Mendenhall, T. C. “A Mechanical Solution of a Literary Problem”.Popular Science, 60 (1901), 97–105.
Merriam, T. V. N.Modelling a Canon: A Stylometric Examination of Shakespeare's First Folio. Unpublished Ph.D. dissertation. University of London, 1992.
Merriam, T. V. N. “Marlowe's Hand in Edward III”.Literary and Linguistic Computing, 8 (1993), 59–72.
Mosteller, F. and D. L. Wallace.Inference and Disputed Authorship: The Federalist. Reading, MA: Addison-Wesley, 1964.
Mosteller, F. and D. L. Wallace.Applied Bayesian and Classical Inference: The Case of the Federalist Papers. 2nd. Ed. New York: Springer-Verlag, 1984.
Osgood, C. E., G. Suci and P. H. Taunenbaum.The Measurement of Meaning. Urbana, IL: University of Illinois Press, 1957.
Rokeach, M., R. Homant and L. Penner. “A Value Analysis of the Disputed Federalist Papers”.Journal of Personality and Social Psychology, 16 (1970), 245–50.
SAS Institute, Inc.SAS User's Guide: Statistics. Cary, NC: SAS Institute, 1985.
Siegel, S. and N. J. Castellan.Nonparametric Statistics for the Behavioral Sciences. New York: McGraw-Hill, 1988.
Sigelman, L. and W. Jacoby. “The Not-So-Simple Art of Imitation: Pastiche, Literary Style, and Raymond Chandler”.Computers and the Humanities (in press).
Specht, D. F. “Probabilistic Neural Networks”.Neural Networks, 3 (1990), 109–18.
Specht, D. F. and P. D. Shapiro. “Generalized Accuracy of Probabilistic Neural Networks Compared with Back-Propagation Networks”. InProceedings of the International Joint Conference on Neural Networks. Seattle, WA., 1991, pp. 887–92.
Spence, D. P., H. S. Scarborough and E. H. Ginsberg. “Lexical Correlates of Cervical Cancer”.Social Science and Medicine, 12 (1978), 141–44.
SPSS, Inc.SPSS Reference Guide. Chicago: SPSS, Inc., 1990.
Stone, P. J. et al.The General Inquirer: A Computer Aproach to Content Aalysis. Cambridge, MA: MIT Press, 1966.
Ward Systems Group.NeuroWindows: Neural Network Dynamic Link Library. Frederick, MD: Ward Systems Group, 1992.
Williams, C. B. “A Note on the Statistical Analysis of Sentence-Length as a Criterion of Literary Style”.Biometrika, 31 (1939), 356–61.
Yule, G. U. “On Sentence-Length as a Statistical Characteristic of Style in Prose: With Application to Two Cases of Disputed Authorship”.Biometrika, 30 (1938), 363–90.
Yule, G. U.The Statistical Study of Literary Vocabulary. Cambridge: Cambridge University Press, 1944.
Author information
Authors and Affiliations
Additional information
Colin Martindale is Professor of Psychology at the University of Maine. He is author of a number of articles and books on content analysis, literary history, and other topics. A recent book isThe Clockwork Muse: The Predictability of Artistic Change (New York: Basic Books). He is Executive Editor ofEmpirical Studies of the Arts and serves on the editorial boards ofComputers and the Humanities andPoetics.
Dean McKenzie is Professional Officer/Statistician for Psychological Medicine, Monash University, Melbourne, Australia. He is author of several articles concerned with machine learning and artificial intelligence.
Rights and permissions
About this article
Cite this article
Martindale, C., McKenzie, D. On the utility of content analysis in author attribution:The Federalist . Comput Hum 29, 259–270 (1995). https://doi.org/10.1007/BF01830395
Issue Date:
DOI: https://doi.org/10.1007/BF01830395