Humor norms for 4,997 English words
Humor ratings are provided for 4,997 English words collected from 821 participants using an online crowd-sourcing platform. Each participant rated 211 words on a scale from 1 (humorless) to 5 (humorous). To provide for comparisons across norms, words were chosen from a set common to a number of previously collected norms (e.g., arousal, valence, dominance, concreteness, age of acquisition, and reaction time). The complete dataset provides researchers with a list of humor ratings and includes information on gender, age, and educational differences. Results of analyses show that the ratings have reliability on a par with previous ratings and are not well predicted by existing norms.
KeywordsHumor Crowd-sourcing Ratings Gender differences
The appreciation of humor is a fundamental, albeit mysterious, part of human cognition. We laugh at things like Monty Python and the work of Douglas Adams, but find topics like mass shootings and the Holocaust off limits. Other topics, like sunsets and freedom, may lie somewhere in between. What makes one thing funnier than another? And what makes some topics inviolable in relation to humor? To help develop this research, we provide the first set of humor norms for a large collection of 4,997 common words. The aim of providing this data is to help enrich the resources available for understanding the cognitive, developmental, and applied aspects of humor.
Humor has a long history of theoretical investigation. Darwin (1872) called humor “tickling the mind.” Thomas Hobbes (1840) referred to it as a feeling of “sudden glory.” These represent a selection from a long list of efforts to provide a theory of humor (reviewed in Hurley, Dennett, & Adams, 2011; Keith-Spiegel, 1972; Wyer & Collins, 1992). These include biological theories – such as the Darwin-Hecker hypothesis that humor is a cognitive analogue of physical tickling (Fridlund & Loftis, 1990; Harris & Christenfeld, 1997); superiority theories, such as Hobbes notion of “sudden glory” over another individual or one’s previous self (Hobbes, 1840); release theories, such as that proposed by Spencer (1860) and later Freud (1928), that humor is a means of reducing excessive arousal; incongruity-resolution theories (Shultz, 1976; Suls, 1972), perhaps first noted by Kant (1790/1914), in his observation that “In everything that is to excite a lively convulsive laugh there must be something absurd,” and later developed by Schopenhauer (for an overview, see Roeckelein, 2006), who suggested the “ludicrous” required a “contrast…between representation of perception and abstract representations.” Still further theories have focused on the adaptive value of humor as an error correction mechanism and faulty logic detection system (Minsky, 1981), most recently and thoroughly developed by Hurley, Dennett, and Adams (2011). A similar version of this theory has been called the benign violation theory (McGraw & Warren, 2010), which suggests a person must realize the stimuli is incongruous with their expectations (violation), but also that this incongruity is not harmful given the context (benign).
The onslaught of theories aimed at understanding humor reflects our common experience that humor is a key ingredient in what it means to be a healthy human. It may even be uniquely human and, continuing the noble history validating intuition with Latin, Koestler (1964) referred to humans as Homo ridens, “laughing man” (see also Milner, 1972). Whether or not it is unique to humans, humor has well-documented influences on well-being and health, including self-concept, coping with stress, and positive affect (Cann & Collette, 2014; Galloway & Cropley, 1999; Martin et al., 1993; Mora-Ripoll, 2011). Humor research also contains a wide body of literature concerned with understanding adult and child personality development (Martin, 1998; McGhee, 1971) and gender differences (Abel, & Flick, 2012; Hay, 1995; Mickes, Walker, Parris, Mankoff, & Christenfeld, 2012). The latter associated with the evolutionary hypothesis that humor plays a role in male mating displays (McGee & Shevlin, 2009), and which is further supported by gender differences in response to humor in the brain (Azim, Mobbs, Jo, Menon, & Reiss, 2005; see also Goel & Dolan, 2001).
In addition, cracking the riddle of what makes things funny has also been the motivation for a number computational algorithms designed to create humor, such as JAPE (Binsted, Pain, & Ritchie, 1997), STANDUP (Manurung et al., 2008), WISCRAIC (McKay, 2002), and HAHAcronym (Stock & Strapparava, 2003), as well as algorithms to detect and classify humor (Davidov, Tsur, & Rappoport, 2010; Mihalcea & Strapparava, 2005).
Much of the theory and empirical work briefly outlined above focuses on complete multi-word jokes, such as this zinger by Steven Wright: “I couldn’t repair your brakes, so I made your horn louder.” To this end, a number of studies have taken to rating and creating databases of jokes in an effort to allow researchers disaggregate the various mechanisms that make them work (e.g., Goldberg, Roeder, Gupta, & Perkins, 2001; Wicker, Thorelli, Barron III, & Willis, 1981). A few studies have looked at single non-words (Westbury, Shaoul, Moroschan, & Ramscar, 2016), suggesting the absurdness of a non-word results in associated humor. None, to our knowledge, have focused on single English words.
The database we present here offers a basis for studying humor in perhaps a highly rudimentary “fruit fly” version, at the level of a single word. If single words have reliable humor ratings, they provide humor in miniature, allowing us to investigate humor in relation to the many existing lexical norms. These include some that are directly related to past theories – such as Freud’s (1928) arousal theory – and others that offer at least some insight into processing and expectation, such as reaction times and frequency.
The collection of the humor norms follows on previous work demonstrating the advantage of crowd-sourcing in psychological norm development: for example, Warriner, Kuperman, and Brysbaert (2013) have collected valence, arousal, and dominance ratings for 13,915 English words; Brysbaert, Warriner, and Kuperman (2014) collected concreteness ratings for nearly 40,000 English words; and Kuperman, Stadthagen-Gonzalez, and Brysbaert (2012), collected age of acquisition ratings for 30,000 English words. These were in turn based on the value of previous norms, such as the Affective Norms for English, provided by Bradley and Lang (1999).1 Still other normative ratings have investigated different word properties, which have provided the basis for further investigating their influence on cognition, such as imageability and familiarity (Stadthagen-Gonzalez & Davis, 2006), pleasantness (Bellezza, Greenwald, & Banaji, 1986), and meaningfulness (Paivio, Yuille, & Madigan, 1968).
These normative datasets have proven highly fruitful. For illustration, Dodds et al. (2015) used valence ratings to assess a universal positivity bias. Alhothali and Hoey (2015) used valence ratings to predict readers’ responses to news articles. And Hills and colleagues (Hills & Adelman, 2015; Hills, Adelman, & Noguchi, 2016) used concreteness, age of acquisition, and lexical reaction times to evaluate the changing history of American English over the last 200 hundred years.
Here, we provide a large dataset of single-word humor ratings along with the demographics of the raters. The list of rated words was formed from the intersection of overlapping previous non-humor word norms, allowing us to provide an analysis of how word-level humor relates to valence, arousal, word length, concreteness, word processing time and word frequency. Secondly, breaking down our dataset by demographics, we provide a separation of humor by gender.
The words in the norms are chosen from the intersection of the valence, arousal, and dominance norms (Warriner, Kuperman, & Brysbaert, 2013), age of acquisition norms (Kuperman, Stadthagen-Gonzalez, & Brysbaert, 2012), lexical decision norms (Keuleers, Lacey, Rastle, & Brysbaert, 2012), and frequency norms (Van Heuven, Mandera, Keuleers, & Brysbaert, 2014). This resulted in 7,775 words, from which the final word list of 5,000 words was randomly sampled. This reduction in list size increases the number of raters exposed to a single word, given a fixed number of participants.
Calibrator words presented to participants
Mean humor rating (Pilot)
Data collection and participants
You will rate how you felt while reading each word. There will be approximately 200 words. The rating scale ranges from 1 (humorless = not funny at all) to 5 (humorous = most funny). At one extreme of the scale, you find the word dull or unfunny; in that case, you should give the word a rating of 1. At the other extreme of the scale, you feel the word is amusing or likely to be associated with humorous thought or language (for example, it is absurd, amusing, hilarious, playful, silly, whimsical, or laughable); in this case, you should give the word a rating of 5. The scale also allows you to describe intermediate of humor; if you feel the word is neutral (neither humorous nor humorless), select the middle of the scale (rating 3).
After you fill out some basic information about yourself, a word list will appear. Simply click the most accurate humor rating for each word. Once you finish rating the words, we will ask you a couple of questions about the way you use humor. Please work at a rapid pace and don't spend too much time thinking about each word. Rather, make your ratings based on your first and immediate reaction as you read each word.
The introduction was followed by the list of 211 words, each word having five buttons presented just below it, numbered from 1 to 5, with the extremes labeled “humorless” (1) and “humorous” (5). The first 11 words were the calibrator words. The combination of the remaining 200 words was different across participants. After selecting a rating for a word, the word disappeared from the list. Upon rating all words, the participant could press the “Submit” button. The participant was then presented with a debrief page and directed back to Amazon. Each participant was paid US$1. The study took approximately 15 min to complete, including reading the instructions and the debrief page.
The data were presented to 950 participants. 102 participants were removed due to incomplete submissions, errors in the data and improperly submitting their responses. Five participants were removed due to low variability of their responses (the standard deviation of their humor ratings, on a 1–5 scale, was smaller than 0.2, indicating they chose roughly the same value for all words). Twenty-two participants were removed because they indicated their primary language was not English. The final data consisted of 821 participants. The raw data had 173,231 individual data points, referring to a single rating of a single word. Ratings were collected for 4,997 words, with each word rated by at least 15 participants. The average number of participants rating a word was 33 (M = 32.93, SD = 5.64, n = 4986). The 11 calibrators were rated by all 821 participants.
Education distribution of the participants
Number of participants
% of participants
Some High School
High School Diploma
Higher than Postgraduate Degree
Descriptive statistics of mean humor ratings (MHR)
Words with the most extreme mean humor ratings
Correlations between 11 lexical measures
Mean Humor Rating
Age of Acquisition
Words with the largest differences between male and female ratings
Words rated more humorous by males
Words rated more humorous by females
Words with the lowest differences in gender, while scoring high on mean humor rating (MHR)
Gender difference (MHRM– MHRF)
To allow for further investigation of age differences, we also provide the MHR for younger and older participants separately. The mean age of all participants was 35 years (M = 35.37, SD = 11.74, n = 821), with a median value of 32. The two groups (younger and older) were constructed as an outcome of a median split of the dataset. The younger group consists of participants with age ≤32 (n = 424, M = 26.7, SD = 3.52, min = 18, max = 32), the older group of participants with age >32 (n = 397, M = 44.7, SD = 10.2, min = 33, max = 78). The overall humor ratings of the younger participants (MY = 2.42, SDY = 0.49) were comparable to those of the older participants (MO = 2.41, SDY = 0.48). The ratings of the younger and older groups are strongly correlated, r(4,995) = .63, p < .001.
Words with the largest rating differences between younger and older participants
Words rated more humorous by younger
Words rated more humorous by older
The supplementary material contains age-separate ratings for each word, allowing for further analyses of age differences in humor ratings.
Using the ready availability of large online data collection, the present study has created a database of single-word humor ratings. The statistical analyses show that people view words as humorous to a varying extent, with a skew towards seeing the majority of words as humorless. The appraisal of single-word humor can be reliably measured across participants, similarly to that of arousal.
The present study shows examples of analyses that can be carried out with the humor dataset. Specifically, it is possible to show correlational relationships between humor rating and other variables (i.e., frequency and lexical reaction times). This approach may, in turn, inform us on how the underlying mechanisms of humor work, or at the very least, where to look in the future. Additionally, it is possible to investigate gender differences in humor appraisal.
Besides the above-mentioned examples, we identify three fields of interest for future research. First, using existing databases of jokes (e.g., Goldberg, Roeder, Gupta, & Perkins, 2001), the humor ratings make it possible to explore the relationship between the appraisal of humor on the joke level and on the single-word level. Second, the humor norms provide a resource for machine learning methods to establish the best predictors of word level humor, which can later be evaluated in psychological experiments. Third, individual ratings of words in relation to the norms can provide a basis for understanding individual differences in humor styles (e.g., Martin, Puhlik-Doris, Larsen, Gray, & Weir, 2003). Finally, like previous ratings, the humor norms may offer new insights into text analysis and the creation of psychological stimuli.
The mean humor ratings are freely available as part of our dataset. The data can be accessed at https://github.com/tomasengelthaler/HumorNorms, downloadable as a.csv file. The sheet is organized alphabetically, by word label. It includes the mean humor rating for all participants combined (mean_ALL), along with the standard deviation (sd_ALL) and the number of participants rating a word (n_ALL). The same three variables are available exclusively for participants identifying as male (mean_M/sd_M/n_M) and for those identifying as female (mean_F/sd_F/n_F). Additionally, the variables are also presented according to the median split of age, dividing participants into a younger group (age ≤32; mean_young/sd_young/n_young) and an older group (age >32; mean_old/sd_old/n_old).
Dutch (Moors et al., 2013), Finnish (Söderholm, Häyry, Laine, & Karrasch, 2013), French (Monnier, & Syssau, 2014), German (Kanske, & Kotz, 2010), Italian (Montefinese, Ambrosini, Fairfield, & Mammarella, 2014), Portuguese (Soares, Comesaña, Pinheiro, Simões, & Frade, 2012), and Spanish (Redondo, Fraga, Padrón, & Comesaña, 2007).
Thanks to Marc Brysbaert and Victor Kuperman for input on the design and implementation of the ratings. We appreciate the help of Thomas Cordua-von Specht in programming the crowd-sourcing platform. Additional thanks to Masitah, Li Ying, Eva Jimenez, and Kita Sotaro for input on the manuscript.
- Alhothali, A., & Hoey, J. (2015). Good news or bad news: Using affect control theory to analyze readers’ reaction towards news articles. In Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (pp. 1548–1558). Denver, CO: The Association for Computational Linguistics.Google Scholar
- Bradley, M. M., & Lang, P. J. (1999). Affective norms for English words (ANEW): Instruction manual and affective ratings (pp. 1–45). Technical Report C-1. The Center for Research in Psychophysiology, University of Florida.Google Scholar
- Darwin, C. (1872). The Expression of the Emotions in Man and Animals. London: John Murray.Google Scholar
- Davidov, D., Tsur, O., & Rappoport, A. (2010). Semi-supervised recognition of sarcastic sentences in twitter and amazon. In Proceedings of the fourteenth conference on computational natural language learning (pp. 107–116). Association for Computational Linguistics.Google Scholar
- Dodds, P. S., Clark, E. M., Desu, S., Frank, M. R., Reagan, A. J., Williams, J. R., … Megerdoomian, K. (2015). Human language reveals a universal positivity bias. Proceedings of the National Academy of Sciences, 112, 2389–2394.Google Scholar
- Freud, S. (1928). Humour. International Journal of Psychoanalysis, 9, 1–6.Google Scholar
- Hay, J. (1995). Gender and humour: Beyond a joke. Wellington, New Zealand: MA thesis, Victoria University of Wellington.Google Scholar
- Hills, T. T., Adelman, J. S., & Noguchi, T. (2016). Attention economies, information crowding, and language change. In Jones, M. N. (Ed.), Big Data in Cognitive Science. Psychology Press.Google Scholar
- Hobbes, T. (1840). Human Nature. In W. Molesworth (Ed.), The English Works of Thomas Hobbes Of Malmesbury, 4th ed. London: Bohn.Google Scholar
- Hurley, M. M., Dennett, D. C., & Adams, R. B. (2011). Inside jokes: Using humor to reverse-engineer the mind. Cambridge: MIT Press.Google Scholar
- Kant, I. (1914). The Critique of Judgement (J. H. Bernard, Trans.). London: Macmillian. (Original work published 1790).Google Scholar
- Keith-Spiegel, P. (1972). Early conceptions of humor: Varieties and issues. In J. H. Goldstein & P. E. McGhee (Eds.), The Psychology of Humor: Theoretical Perspectives and Empirical Issues (pp. 4–39). New York: Academic Press.Google Scholar
- Koestler, A. (1964). The act of creation. New York: Penguin Books.Google Scholar
- McKay, J. (2002). Generation of idiom-based witticisms to aid second language learning. In Proceedings of the Twente Workshop on Language Technology, 20. The University of Twente.Google Scholar
- Minsky, M. (1981). Jokes and their Relation to the Cognitive Unconscious. In Vaina, L., Hintikka, J. (Eds.) Cognitive Constraints on Communication (pp. 175-200). Boston: Reidel.Google Scholar
- Mihalcea, R., & Strapparava, C. (2005). Making computers laugh: Investigations in automatic humor recognition. In Proceedings of the Conference on Human Language Technology and Empirical Methods in Natural Language Processing (pp. 531–538).Google Scholar
- Moors, A., De Houwer, J., Hermans, D., Wanmaker, S., Van Schie, K., Van Harmelen, A. L., … Brysbaert, M. (2013). Norms of valence, arousal, dominance, and age of acquisition for 4,300 Dutch words. Behavior Research Methods, 45, 169–177.Google Scholar
- Roeckelein, J. (2006). Elsevier's dictionary of psychological theories. Amsterdam [Netherlands]: Elsevier.Google Scholar
- Shultz, T. R. (1976). A cognitive-developmental analysis of humour. In A. J. Chapman & H. C. Foot (Eds.), Humor and laughter: Theory, research, and applications (pp. 11–36). London: John Wiley & SonsGoogle Scholar
- Suls, J. M. (1972). A Two-Stage Model for the Appreciation of Jokes and Cartoons: An Information-Processing Analysis. In J. H. Goldstein & P. E. McGhee (Eds.), The Psychology of Humor : Theoretical Perspectives and Empirical Issues (pp. 81–100). New York: Academic PressGoogle Scholar
- Spencer, H. (1860). The physiology of laughter. Macmillan’s Magazine, 1, 395–402.Google Scholar
- Stock, O., & Strapparava, C. (2003). HAHAcronym: Humorous agents for humorous acronyms. Humor, 16, 297–314.Google Scholar
- Wyer, R., & Collins, J. (1992). A theory of humor elicitation. Psychological Review, 99(4), 663–688.Google Scholar
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.