Linguistic Ethnography: Identifying Dominant Word Classes in Text

  • Rada Mihalcea
  • Stephen Pulman
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5449)


In this paper, we propose a method for ”linguistic ethnography” – a general mechanism for characterising texts with respect to the dominance of certain classes of words. Using humour as a case study, we explore the automatic learning of salient word classes, including semantic classes (e.g., person, animal), psycholinguistic classes (e.g., tentative, cause), and affective load (e.g., anger, happiness). We measure the reliability of the derived word classes and their associated dominance scores by showing significant correlation across different corpora.


News Article Word Class Lexical Resource Human Language Technology Dominance Score 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Attardo, S., Raskin, V.: Script theory revis(it)ed: Joke similarity and joke representation model. Humor: International Journal of Humor Research 4, 3–4 (1991)CrossRefGoogle Scholar
  2. 2.
    Bucaria, C.: Lexical and syntactic ambiguity as a source of humor. Humor 17, 3 (2004)CrossRefGoogle Scholar
  3. 3.
    Liu, H., Mihalcea, R.: Of men, women, and computers: Data-driven gender modeling for improved user interfaces. In: International Conference on Weblogs and Social Media (2007)Google Scholar
  4. 4.
    McCallum, A., Nigam, K.: A comparison of event models for Naive Bayes text classification. In: Proceedings of AAAI 1998 Workshop on Learning for Text Categorization (1998)Google Scholar
  5. 5.
    Mihalcea, R., Pulman, S.: Characterizing humour: An exploration of features in humorous texts. In: Proceedings of the Conference on Intelligent Text Processing and Computational Linguistics, Mexico City (2007)Google Scholar
  6. 6.
    Mihalcea, R., Strapparava, C.: Making computers laugh: Investigations in automatic humor recognition. In: Proceedings of the Human Language Technology / Empirical Methods in Natural Language Processing conference, Vancouver (2005)Google Scholar
  7. 7.
    Mihalcea, R., Strapparava, C.: Technologies that make you smile: Adding humor to text-based applications. IEEE Intelligent Systems 21, 5 (2006)Google Scholar
  8. 8.
    Miller, G., Leacock, C., Randee, T., Bunker, R.: A semantic concordance. In: Proceedings of the 3rd DARPA Workshop on Human Language Technology, Plainsboro, New Jersey (1993)Google Scholar
  9. 9.
    Ortony, A., Clore, G.L., Foss, M.A.: The referential structure of the affective lexicon. Cognitive Science, 11 (1987)Google Scholar
  10. 10.
    Pennebaker, J., Francis, M.: Linguistic inquiry and word count: LIWC. Erlbaum Publishers, MahwahGoogle Scholar
  11. 11.
    Pennebaker, J., King, L.: Linguistic styles: Language use as an individual difference. Journal of Personality and Social Psychology 77, 1296–1312 (1999)CrossRefGoogle Scholar
  12. 12.
    Raskin, V.: Semantic Mechanisms of Humor. Kluwer Academic Publications, Dordrecht (1985)Google Scholar
  13. 13.
    Sjobergh, J., Araki, K.: Recognizing humor without recognizing meaning. In: Proceedings of the Workshop on Cross-Language Information Processing (2007)Google Scholar
  14. 14.
    Strapparava, C., Mihalcea, R.: Learning to identify emotions in text. In: Proceedings of the ACM Conference on Applied Computing ACM-SAC 2008, Fortaleza, Brazile (2008)Google Scholar
  15. 15.
    Strapparava, C., Valitutti, A.: Wordnet-affect: an affective extension of wordnet. In: Proceedings of the 4th International Conference on Language Resources and Evaluation, Lisbon (2004)Google Scholar
  16. 16.
    Taylor, J., Mazlack, L.: Computationally recognizing wordplay in jokes. In: Proceedings of CogSci 2004, Chicago (August 2004)Google Scholar
  17. 17.
    Wiebe, J., Riloff, E.: Creating subjective and objective sentence classifiers from unannotated texts. In: Gelbukh, A. (ed.) CICLing 2005. LNCS, vol. 3406, pp. 486–497. Springer, Heidelberg (2005)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2009

Authors and Affiliations

  • Rada Mihalcea
    • 1
    • 2
  • Stephen Pulman
    • 2
  1. 1.Computer Science DepartmentUniversity of North TexasUSA
  2. 2.Computational Linguistics GroupOxford UniversityUK

Personalised recommendations