Skip to main content

The Rest of the Story: Finding Meaning in Stylistic Variation

  • Chapter
  • First Online:
The Structure of Style

Abstract

The computational analysis of the style of natural language texts, computational stylistics, seeks to develop automated methods to (1) effectively distinguish texts with one stylistic character from those of another, and (2) give a meaningful representation of the differences between textual styles. Such methods have many potential applications in areas including criminal and national security forensics, customer relations management, spam/scam filtering, and scholarly research. In this chapter, we propose a framework for research in computational stylistics, based on a functional model of the communicative act. We illustrate the utility of this framework via several case studies.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    We note that insofar as preferences for certain topics may be intimately related to other aspects of the communicative act that we consider within the purview of style, variation in content variables may be a legitimate and useful object of study. In particular, see the discussion in Sect. 5.4.

  2. 2.

    Metamorphism refers to changes in mineral assemblage and texture in rocks that have been subjected to temperatures and pressures different from those under which they originally formed.

References

  1. Androutsopoulos I, Koutsias J, Chandrinos K, Paliouras G, Spyropoulos C (2000) An evaluation of Naive Bayesian anti-spam filtering. In: Proceedings of the workshop on machine learning in the New Information Age, Barcelona.

    Google Scholar 

  2. Argamon S, Dodick J, Chase P (2008) Language use reflects scientific methodology: a corpus-based study of peer-reviewed journal articles. Scientometrics 75(2):203–238

    Article  Google Scholar 

  3. Argamon S, Goulain J-B, Horton R, Olsen M (2009) Vive la difféerence! text mining gender difference in French literature. Digital Humanit Q 3(2). http://digitalhumanities.org/dhq/vol/3/2/

  4. Argamon S, Koppel M, Avneri G (1998) Routing documents according to style. In: Proceedings of int’l workshop on innovative internet information systems, Pisa, Italy

    Google Scholar 

  5. Argamon S, Koppel M, Fine J, Shimony AR (2003) Gender, genre, and writing style in formal written texts. Text 23(3):321–346

    Google Scholar 

  6. Argamon S, Koppel M, Pennebaker JW, Schler J (2007) Mining the blogosphere: age, gender and the varieties of self-expression. First Monday, 12(9). http://firstmonday.org/issues/issue12_9/argamon/index.html

  7. Argamon S, Olsen M (2006) Toward meaningful computing. Commun ACM 49(4):33–35

    Article  Google Scholar 

  8. Argamon S, Šariéc M, Stein SS (2003) Style mining of electronic messages for multiple author discrimination. In: Proceedings of ACM conference on knowledge discovery and data mining

    Google Scholar 

  9. Argamon S, Whitelaw C, Chase P, Dhawle S, Garg N, Hota SR, Levitan S (2007) Stylistic text classification using functional lexical features. J Am Soc Inf Sci 58(6):802–822

    Article  Google Scholar 

  10. Argamon S, Koppel M, Avneri G (1998) Routing documents according to style. In: First international workshop on innovative information systems, Pisa

    Google Scholar 

  11. Argamon S, Levitan S (2005) Measuring the usefulness of function words for authorship attribution. In: Proceedings of the 2005 ACH/ALLC conference, Victoria, BC, Jun 2005

    Google Scholar 

  12. Argamon-Engelson S, Koppel M, Avneri G (1998) Style-based text categorization: what newspaper am i reading? In: Proceedings of AAAI workshop on learning for text categorization, Madison, WI, pp 1–4

    Google Scholar 

  13. Austin JL (1976) How to do things with words. Oxford University Press, Oxford

    Google Scholar 

  14. Harald Baayen R, van Halteren H, Tweedie F (1996) Outside the cave of shadows: using syntactic annotation to enhance authorship attribution. Lit Linguist Comput 7:91–109

    Google Scholar 

  15. Baker VR (1996) The pragmatic routes of American quaternary geology and geomorphology. Geomorphology 16:197–215

    Article  Google Scholar 

  16. Bean D, Riloff E (2004) Unsupervised learning of contextual role knowledge for coreference resolution. Proceedings of HLT/NAACL, Boston, MA, pp 297–304

    Google Scholar 

  17. Ben-David YL (2002) Shevet mi-Yehudah (in Hebrew). No publisher listed, Jerusalem

    Google Scholar 

  18. Berry MJ, Linoff G (1997) Data Mining techniques: for marketing, sales, and customer support. Wiley, New York, NY

    Google Scholar 

  19. Biber D (1995) Dimensions of register variation: a cross-linguistic comparison. Cambridge University Press, Cambridge

    Book  Google Scholar 

  20. Bloom K, Garg N, Argamon S (2007) Extracting appraisal expressions. In: HLT/NAACL 2007, Rochester, NY, April 2007

    Google Scholar 

  21. Burrows J (2002) ‘Delta’: a measure of stylistic difference and a guide to likely authorship. Lit Linguis Comput 17(3):267–287

    Article  Google Scholar 

  22. Burrows JF (1987) Computation into criticism: a study of Jane Austen’s novels and an experiment in method. Clarendon, Oxford

    Google Scholar 

  23. Butler CS (2003) Structure and function: a guide to three major structural-functional theories. John Benjamins, Amsterdam

    Google Scholar 

  24. Chaski CE (1999) Linguistic authentication and reliability. In: National conference on science and the law, National Institute of Justice, San Diego, CA

    Google Scholar 

  25. Cleland CE (2002) Methodological and epistemic differences between historical science and experimental science. Philos Sci 69(3):447–451

    Article  MathSciNet  Google Scholar 

  26. Coates J (2004) Women, men and language: a sociolinguistic account of gender differences in language. Pearson Education, New York, NY

    Google Scholar 

  27. Dagan I, Karov Y, Roth D (1997) Mistake-driven learning in text categorization. In: Cardie C, Weischedel R (eds) Proceedings of EMNLP-97, 2nd conference on empirical methods in natural language processing, Providence, US, 1997. Association for Computational Linguistics, Morristown, TN pp 55–63

    Google Scholar 

  28. D’Andrade RG (1995) The development of cognitive anthropology. Cambridge University Press, Cambridge

    Book  Google Scholar 

  29. de Vel O (2000) Mining e-mail authorship. In: Workshop on text mining, ACM international conference on knowledge discovery and data mining, Boston, MA

    Google Scholar 

  30. de Vel O, Anderson A, Corney M, Mohay G (2001) Mining email content for author identification forensics. ACM SIGMOD Rec 30(4):55–64

    Article  Google Scholar 

  31. de Vel O, Corney M, Anderson A, Mohay G (2002) Language and gender author cohort analysis of e-mail for computer forensics. In: Proceedings of digital forensic research workshop, Syracuse, NY

    Google Scholar 

  32. Diamond J (2002) Guns, germs and steel: the fates of human societies. W.W. Norton, New York, NY

    Google Scholar 

  33. Dimitrova M, Finn A, Kushmerick N, Smyth B (2002) Web genre visualization. In: Proceedings of the conference on human factors in computing systems, Minneapolis, MN

    Google Scholar 

  34. Fawcett RP (1980) Cognitive linguistics and social interaction: towards an integrated model of a systemic functional grammar and the other components of a communicating mind. John Benjamins, Amsterdam

    Google Scholar 

  35. Feiguina O, Hirst G (2007) Authorship attribution for small texts: literary and forensic experiments. In: Proceedings of the conference of the international association of forensic linguistics, Seattle, WA

    Google Scholar 

  36. Finn A, Kushmerick N, Smyth B (2002) Genre classification and domain transfer for information filtering. In: Crestani F, Girolami M, van Rijsbergen CJ (eds) Proceedings of ECIR-02, 24th European colloquium on information retrieval research, Glasgow, Springer, Heidelberg, DE

    Google Scholar 

  37. Forman G (2003) An extensive empirical study of feature selection metrics for text classification. J Mac Learn Res 3(7–8):1289–1305

    MATH  Google Scholar 

  38. Genkin A, Lewis DD, Madigan D (2006) Large-scale Bayesian logistic regression for text categorization. Technometrics 49(3):291–304

    Article  MathSciNet  Google Scholar 

  39. Gorsuch RL (1983) Factor analysis. L. Erlbaum, Hillsdale, NJ

    Google Scholar 

  40. Gould SJ (1986) Evolution and the triumph of homology, or, why history matters. Am Sci Jan.–Feb.:60–69

    Google Scholar 

  41. Graham N, Hirst G (2003) Segmenting a document by stylistic character. In: Workshop on computational approaches to style analysis and synthesis, 18th international joint conference on artificial intelligence, Acapulco

    Google Scholar 

  42. Gregory M (1967) Aspects of varieties differentiation. J Linguist 3:177–198

    Article  Google Scholar 

  43. Gumperz JJ, Levinson SC (1996) Rethinking linguistic relativity. Cambridge University Press, Cambridge

    Google Scholar 

  44. Hacking I (2002) Historical ontology. Harvard University Press, Cambridge, MA

    Google Scholar 

  45. Halliday MAK, Hasan R (1976) Cohesion in English. Longman, London

    Google Scholar 

  46. Halliday MAK (1978) Language as social semiotic: the social interpretation of language and meaning. Edward Arnold, London

    Google Scholar 

  47. Halliday MAK (1994) Introduction to functional grammar, 2nd edn. Edward Arnold, London

    Google Scholar 

  48. Harris J (1989) The idea of community in the study of writing. Coll Compos Commun 40(1):11–22

    Article  Google Scholar 

  49. Herring SC, Scheidt LA, Bonus S, Wright E (2004) Bridging the gap: a genre analysis of weblogs. In: Proceedings of the 37th Hawai’i international conference on system sciences (HICSS-37), IEEE Computer Society, Los Alamitos, CA

    Google Scholar 

  50. Heylighen F, Dewaele JM (2002) Variation in the contextuality of language: an empirical measure. Found Sci 7(3):293–340

    Article  Google Scholar 

  51. Holmes DI (1998) The evolution of stylometry in humanities scholarship. Lit Linguis Comp 13(3):111–117

    Article  Google Scholar 

  52. Holmes J, Meyerhoff M (2000) The community of practice: theories and methodologies in language and gender research. Lang Soc 28(02):173–183

    Article  Google Scholar 

  53. Hoover D (2002) Frequent word sequences and statistical stylistics. Lit Linguis Comput 17:157–180

    Article  Google Scholar 

  54. Joachims T (1999) Making large-scale SVM learning practical. In: Schölkopf B, Burges C, Smola A (eds) Advances in Kernel methods–-support vector learning. MIT, Cambridge, MA

    Google Scholar 

  55. Joachims T (1998) Text categorization with support vector machines: learning with many relevant features. In: Nédellec C, Rouveirol C (eds) Proceedings of ECML-98, 10th European conference on machine learning, number 1398, Chemnitz, DE. Springer, Heidelberg, DE pp 137–142

    Google Scholar 

  56. Juola P (2008) Authorship attribution. Found trends Inf Retr 1(3):233–334

    Article  Google Scholar 

  57. Karlgren J (2000) Stylistic experiments for information retrieval. PhD thesis, SICS

    Google Scholar 

  58. Kessler B, Nunberg G, Schütze H (1997) Automatic detection of text genre. In: Cohen PR, Wahlster W (eds) Proceedings of the 35 annual meeting of the association for computational linguistics and 8th conference of the European chapter of the association for computational linguistics, Association for Computational Linguistics, Somerset, NJ, pp 32–38

    Chapter  Google Scholar 

  59. Kitcher P (1993) The advancement of science. Oxford University Press, New York, NY

    Google Scholar 

  60. Kjell B, Frieder O (1992) Visualization of literary style. In: IEEE international conference on systems, man and cybernetics, Chicago, IL, pp 656–661

    Google Scholar 

  61. Koppel M, Argamon S, Shimoni AR (2003) Automatically categorizing written texts by author gender. Lit Linguist Comput 17(4):401–412

    Article  Google Scholar 

  62. Koppel M, Mughaz D, Schler J (2004) Text categorization for authorship verification. In: Proceedings of 8th Symposium on artificial intelligence and mathematics, Fort Lauderdale, FL

    Google Scholar 

  63. Koppel M, Schler J (2004) Authorship verification as a one-class classification problem. In: Proceedings of Int’l conference on machine learning, Banff, AB

    Google Scholar 

  64. Koppel M, Schler J, Argamon S (2008) Computational methods in authorship attribution. J Am Soc Inf Sci Technol 60(1):9–26

    Article  Google Scholar 

  65. Koppel M, Akiva N, Dagan I (2003) A corpus-independent feature set for style-based text categorization. In: Workshop on computational approaches to style analysis and synthesis, 18th international joint conference on artificial intelligence, Acapulco

    Google Scholar 

  66. Kukushkina OV, Polikarpov AA, Khmelev DV (2001) Using literal and grammatical statistics for authorship attribution. Prob Inf Trans 37(2):172–184

    Article  MATH  MathSciNet  Google Scholar 

  67. Kushmerick N (1999) Learning to remove internet advertisement. In: Etzioni O, Müller JP, Bradshaw JM (eds) Proceedings of the 3rd international conference on autonomous agents (Agents’99), ACM Press, Seattle, WA, pp 175–181

    Google Scholar 

  68. Lang K (1995) NewsWeeder: learning to filter netnews. In: Proceedings of the 12th international conference on machine learning, Morgan Kaufmann, San Mateo, CA, pp 331–339

    Google Scholar 

  69. Lewis DD (1998) Naive (Bayes) at forty: the independence assumption in information retrieval. Proceedings of ECML-98, 10th European conference on machine Learning, 1998, Berlin, Springer, Heidelburg, pp 4–15

    Google Scholar 

  70. Littlestone N (1987) Learning when irrelevant attributes abound. In: Proceedings of the 28th annual symposium on foundations of computer science, October 1987, Los Angeles, CA, pp 68–77

    Google Scholar 

  71. Martin JR (1992) English text: system and structure. Benjamin’s, Amsterdam

    Google Scholar 

  72. Martin JR, White PRR (2005) The language of evaluation: appraisal in English. Palgrave, London

    Google Scholar 

  73. Mascol C (1888) Curves of Pauline and Pseudo-Pauline style I. Unitarian Rev 30:452–460

    Google Scholar 

  74. Mascol C (1888) Curves of Pauline and Pseudo-Pauline style II. Unitarian Rev 30:539–546

    Google Scholar 

  75. Matthews RAJ, Merriam TVN (1997) Distinguishing literary styles using neural networks, chapter 8. IOP publishing and Oxford University Press, Oxford

    Google Scholar 

  76. Matthiessen C (1995) Lexico-grammatical cartography: English systems. International Language Sciences Publishers, Tokyo

    Google Scholar 

  77. Mayr E (1976) Evolution and the diversity of life. Harvard University Press, Cambridge, MA

    Google Scholar 

  78. Mayr E (1985) How biology differs from the physical sciences. In: Evolution at the crossroads: the new biology and the new philosophy of science, MIT, Cambridge, pp 43–46

    Google Scholar 

  79. McCallum A, Nigam K (1998) A comparison of event models for Naive Bayes text classification. AAAI-98 workshop on learning for text categorization, 752, pp 41–48

    Google Scholar 

  80. McEnery A, Oakes M (2000) Authorship studies/textual statistics, Marcel Dekker, New York, NY, pp 234–248

    Google Scholar 

  81. McKinney V, Yoon K, Zahedi FM (2002) The measurement of web-customer satisfaction: an expectation and disconfirmation approach. Info Sys Res 13(3):296–315

    Article  Google Scholar 

  82. McMenamin G (2002) Forensic linguistics: advances in forensic stylistics. CRC press

    Google Scholar 

  83. Mendenhall TC (1887) Characteristic curves of composition. Science 9(214s):237–246

    Article  Google Scholar 

  84. Mosteller F, Wallace DL (1964) Inference and disputed authorship: the federalist. Series in behavioral science: quantitative methods edition. Addison-Wesley, Reading, MA

    Google Scholar 

  85. Mulac A, Lundell TL (1986) Linguistic contributors to the gender-linked language effect. J Lang Soc Psychol 5(2):81

    Article  Google Scholar 

  86. Newman ML, Groom CJ, Handelman LD, Pennebaker JW (2008) Gender Differences in language use: an analysis of 14,000 text samples. Discourse Process 45(3):211–236

    Article  Google Scholar 

  87. Ng V (2004) Learning noun phrase anaphoricity to improve coreference resolution: issues in representation and optimization. Proceedings of the 42nd annual meeting of the association for computational linguistics (ACL), Barcelona, pp 152–159

    Google Scholar 

  88. Pang B, Lee L, Vaithyanathan S (2002) Thumbs up? sentiment classification using machine learning techniques. In: Proceedings of EMNLP conference on empirical methods in natural language processing, Philadelphia, PA, pp 79–86

    Google Scholar 

  89. Patrick J (2004) The scamseek project: text mining for financial scams on the internet. In: Simoff SJ, Williams GJ (eds) Proceedings of 3rd Australasian data mining conference, Carins, pp 33–38

    Google Scholar 

  90. Pennebaker JW, Mehl MR, Niederhoffer K (2003) Psychological aspects of natural language use: our words, our selves. Ann Rev Psychol 54:547–577

    Article  Google Scholar 

  91. Platt J (1998) Sequential minimal optimization: a fast algorithm for training support vector machines. Microsoft research technical report MSR-TR-98-14, Redmond, WA

    Google Scholar 

  92. Rudman J (1997) The state of authorship attribution studies: some problems and solutions. Comput Human 31(4):351–365

    Article  Google Scholar 

  93. Rudolph JL, Stewart J (1998) Evolution and the nature of science: on the historical discord and its implication for education. J Res Sci Teach 35:1069–1089

    Article  Google Scholar 

  94. Searle JR (1989) Expression and meaning: studies in the theory of speech acts. Cambridge University Press, Cambridge

    Google Scholar 

  95. Sebastiani F (2002) Machine learning in automated text categorization. ACM Comput Surv 34(1)

    Google Scholar 

  96. Stamatatos E, Fakotakis N, Kokkinakis GK (2000) Automatic text categorization in terms of genre, author. Comput Linguist 26(4):471–495

    Article  Google Scholar 

  97. Swales JM (1990) Genre analysis. Cambridge University Press, Cambridge

    Google Scholar 

  98. Torvik VI, Weeber M, Swanson DR, Smalheiser NR (2005) A probabilistic similarity metric for Medline records: a model for author name disambiguation. J Am Soc Inf Sci Technol, 56(2):140–158

    Article  Google Scholar 

  99. Turney PD (2002) Thumbs up or thumbs down? semantic orientation applied to unsupervised classification of reviews. In: Proceedings 40th annual meeting of the ACL (ACL’02), Philadelphia, PA, pp 417–424

    Google Scholar 

  100. Tweedie F, Singh S, Holmes D (1996) Neural network applications in stylometry: the federalist papers. Comput Human 30(1):1–10

    Article  Google Scholar 

  101. Wenger E (1999) Communities of practice: learning, meaning, and identity. Cambridge University Press, Cambridge

    Google Scholar 

  102. Whewell W (1837) History of the inductive sciences. John W. Parker, London

    Google Scholar 

  103. Yang Y (1999) An evaluation of statistical approaches to text categorization. Inf Retr 1(1):69–90

    Article  Google Scholar 

  104. Yang Y, Pedersen JO (1997) A Comparative study on feature selection in text categorization. Proceedings of the 14th international conference on machine learning table of contents, Nashville, TN, pp 412–420

    Google Scholar 

  105. Yule GU (1994) Statistical study of literary vocabulary. Cambridge University Press, Cambridge

    Google Scholar 

  106. Yule GU (1938) On sentence length as a statistical characteristic of style in prose with application to two cases of disputed authorship. Biometrika 30:363–390

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Shlomo Argamon .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Argamon, S., Koppel, M. (2010). The Rest of the Story: Finding Meaning in Stylistic Variation. In: Argamon, S., Burns, K., Dubnov, S. (eds) The Structure of Style. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-12337-5_5

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-12337-5_5

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-12336-8

  • Online ISBN: 978-3-642-12337-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics