The narratives of novels, films, and plays both reflect and influence human behavior. Authors directly and indirectly theorize on psychological processes in their work, describing how and why people think, feel, and behave the way they do (Cutting, 2016)—leading some to conclude that authors are often more experienced observers of human behavior than psychologists (Rosenberg, 2013). Fictional narratives not only lend insight into human behavior; in some cases, fiction influences behavior as well. Examples include violent media eliciting aggression (Bushman & Anderson, 2001), fan fiction as a gateway to sexual development (Mixer, 2018), and popular television’s influence on accent change (e.g., the London-based series EastEnders has been implicated in rising Cockney accents among young Glaswegians; Stuart-Smith, Pryce, Timmins, & Gunter, 2013). More broadly, reading fiction theoretically involves simulating social worlds, which in turn can improve emotion regulation and social skills (Mar & Oatley, 2008). In part because of the impacts that narrative fiction has on individuals and cultures, it is important to examine the process by which people evaluate films. Understanding the narrative preferences of audiences has relevance for more basic research on human cognition, affect, and behavior, as well. Specifically, studying how language patterns in films relate to critics’ and audiences’ preferences adds to the growing literature on how mathematical properties of attitude objects (e.g., the averageness or typicality of human faces and song lyrics) relate to attitudes (Berger & Packard, 2018; Trujillo, Jankowitsch, & Langlois, 2014). In the present study, we attempt to uncover which types of narrative arcs in films individuals tend to enjoy more, employing novel linguistic measures of genre-typicality adapted from earlier computational linguistic research on narratives in novels and short stories (Blackburn, 2015; Malin et al., 2014).

Narrative Arc Theory

Narrative arcs have been studied for millennia. Aristotle (c. 335 BCE/1961) proposed that fictional plots should have three acts—a beginning, middle, and end—that are causally connected and self-contained, imitating how dramatic events unfold in real life. Centuries later, Aristotle’s ideal narrative was refined and expanded by Freytag (1894) into a five-arc narrative: exposition, rising action, climax, falling action, and resolution or denouement. On the basis of Freytag’s (1894) five-arc narrative, Blackburn (2015) and Malin et al. (2014) developed Narrative Arc Theory, which describes the typical linguistic trajectories found in fictional novels and short stories using the Linguistic Inquiry and Word Count (LIWC; Pennebaker, Booth, Boyd, & Francis, 2015) software, a widely used dictionary-based text analysis program. After dividing each text into five equal segments by word count, LIWC calculates the percentage of words in each segment that fall into various linguistic (e.g., articles, personal pronouns), cognitive (e.g., insight, causation), and affective (e.g., positive emotion, anger) categories.

Narrative Arc Theory has explored how categorization words, narrative action words, cognitive-processing words, and negative and positive emotion words follow prototypical trajectories or arcs throughout fictional narratives (Blackburn, 2015; Malin et al., 2014; see Table 1). Categorization words—made up of articles and prepositions—convey analytic thinking (Pennebaker, Chung, Frazee, Lavergne, & Beaver, 2014) and are highest in the expositional first act of a narrative (Blackburn, 2015; Malin et al., 2014). Constructing the characters and laying out the setting of a narrative requires naming objects or entities (entailing higher rates of articles) and describing how they relate to each other (entailing higher rates of prepositions). Categorization words decrease as the plot advances, because the setting and characters are already known. Following the exposition is the rising action, wherein narrative action words (personal and impersonal pronouns, adverbs, auxiliary verbs, conjunctions, and negations) increase in order to portray known characters (more often referred to by pronouns such as he and she than by name) carrying out actions to advance the plot (Blackburn, 2015; Malin et al., 2014). The rising action leads to the climax, reflected in a peak in cognitive-processing words (including insight, causation, discrepancy, tentativeness, certainty, and differentiation language categories), which help depict characters’ thoughts and motivations. Finally, as the falling action leads to the resolution, rates of both negative and positive emotion words increase, reflecting attempts to find meaning in the narrative as a whole (Blackburn, 2015; Malin et al., 2014).

Table 1 Examples of LIWC 2015 language categories for statistical analysis

An analysis of the same script sample used in the present study recently extended narrative arc theory to films, showing that narrative arcs in films are similar to those found in novels and short stories, with a few key differences (Nalabandian, Iserman, & Ireland, 2018). That work established common narrative arcs across dramatic film scripts but did not find that critics’ or audiences’ attitudes toward films were predicted by the narrative arcs for categorization words, narrative action, cognitive processing, or emotional tone. These null results may have been due to the fact that the analyses did not include genre—a major narrative element—or linguistic measures of typicality, but rather simply regressed audience and critic ratings of films on various language categories as a function of script segment or act. Thus, in the present research we further explored whether and under what conditions films’ popularity relates to genre-typicality in the language categories that define narrative arcs. More specifically, we examined genre-typical language by measuring the frequency of narrative arc theory (Blackburn, 2015; Malin et al., 2014) language categories that are typical of each film’s designated genre throughout the five acts of the film.

Examining film (preferences) through genre

Genre is a cognitive schema that offers individuals information regarding categories of movies, novels, and music (Cutting, 2016; Shevy, 2008). Movie audiences likely use genre in much the same way that people use any other categories in everyday life. Individuals naturally tend to categorize stimuli in order to simplify their environments, which facilitates the development of schemas or mental representations of the world (MacWhinney, 2015). Genres are essentially basic-level (i.e., salient and commonly used) categories that people use to simplify their movie-watching experience—including when deciding which films to watch, forming expectancies, and ultimately judging a film’s quality. As the long history of genre theory attests, taxonomizing fiction is far from straightforward, and many questions remain about the nature of genre, including whether genres are natural kinds (having objective reality independent of human perception) or socially constructed, and thus tied to particular cultures and times (Chandler, 1997; Frow, 2014).

At minimum, researchers interested in the intersection of art and behavioral science tend to agree that a folk psychology of genre guides consumer behavior. That is, people have relatively reliable intuitions about what various fictional genres entail and what kinds of fiction they enjoy, and people use those expectations to guide them to the narratives they seek out in everyday life (Fong, Mullin, & Mar, 2013). Once the genre of a novel or film becomes apparent, the reader or viewer anticipates a storyline consistent with knowledge of the genre (e.g., expecting tragedies to include major character deaths; Cutting, 2016; Leavitt & Christenfeld, 2011), allowing audiences to roughly understand the storyline of a film before they view it. Film and literature theorists suggest that the film industry produces narratives that, in part, conform to specific genres in order to coincide with audience expectancies, maximizing their enjoyment (Altman, 1984). In psychological terms, a movie’s similarity to other films in its genre is bound to influence processing fluency—that is, the degree to which viewers find the narrative easy (familiar and expected) or difficult (unfamiliar and unexpected) to process (Reber, Winkielman, & Schwarz, 1998).

Processing fluency and disfluency

Ease of processing, or processing fluency, has profound and complex effects on attitudes and decision-making. Early research on processing fluency established that individuals tend to like stimuli that are easier to process across multiple contexts, preferring familiar images (Winkielman, Schwarz, Fazendeiro, & Reber, 2008), images viewed over longer periods (Forster, Leder, & Ansorge, 2013; Reber et al., 1998), sans serif or Times fonts (Kaspar, Wehlitz, von Knobelsdorff, Wulf, & von Saldern, 2015; Reber, Wurtz, & Zimmermann, 2004b), legible handwriting (Greifeneder et al., 2010), faces with mathematically average or symmetrical features (Reber, Schwarz, & Winkielman, 2004a; Winkielman et al., 2008), prototypical animals (Halberstadt & Rhodes, 2000), and recently primed product labels (Labroo, Dhar, & Schwarz, 2007) over less fluent stimuli—effects that are related to but not wholly accounted for by perceived familiarity (Halberstadt, 2006). Processing fluency influences reasoning, as well, and is likely the root of the availability heuristic, in which the likelihood of a statement is evaluated based on the ease with which a person can generate confirmatory examples or evidence (Schwarz et al., 1991; Tversky & Kahneman, 1973).

Recent research has suggested that there are benefits to disfluency in some contexts. For example, an experiment on fonts that differed in legibility revealed a quadratic relationship between legibility and both recall and transfer performance; that is, learning performance increased if the text was slightly illegible, but decreased at the highest level of illegibility (Seufert, Wagner, & Westphal, 2017). Along the same lines, difficult-to-process fonts increase individuals’ ability to solve riddles, such as the Moses illusion (“How many animals of each species did Moses bring on the Ark?”), that rely on shallow reading of text (Song & Schwarz, 2008). In consumer behavior, as well, people seem to prefer products with more challenging or disfluent descriptions when seeking items for special occasions, theoretically due to individuals’ beliefs that exclusive products should be difficult to obtain (Pocheptsova, Labroo, & Dhar, 2010). Altogether, the literature indicates that disfluent stimuli may force individuals to engage with the material at a deeper level, yielding greater learning and more persuasive advertising in cases where people are motivated to think critically or expect to be challenged.

Complementing research on the role of fluency in reading and persuasion, results have differed on how and under what circumstances more or less fluent visual art is preferred. Theoretically, art that challenges processing fluency may be favored because it encourages individuals to broaden how they perceive the piece and, perhaps as a result, their environments (Belke, Leder, & Carbon, 2015). Yet there is some evidence that the appeal of fluency carries over into visual art, at least for certain styles of paintings: For representational visual art (depicting real-world objects or events), processing fluency seems to increase liking, but enjoyment of abstract art is unrelated to processing fluency (Belke, Leder, Strobach, & Carbon, 2010). The way that individuals process visual art additionally influences whether people prefer challenging or easy-to-process pieces, suggesting that challenging paintings are appreciated only when people elaborate on and develop mastery over a particular painting or artist’s style (Belke et al., 2015; Gerger, Forster, & Leder, 2017). Similarly, visually complex works of art are preferred over less complex pieces, but only when individuals are able to easily attribute meaning to the visually complex stimuli (i.e., low perceptual fluency paired with high conceptual fluency; Ball, Threadgold, Marsh, & Christensen, 2018). That is, conquering disfluency—or making sense of previously disfluent sensory experiences—may be one pathway through which difficult art is perceived as more rewarding than simpler art that requires less initial effort to enjoy (Ball et al., 2018; Graf & Landwehr, 2015).

Processing fluency has mixed associations with music preferences, as well. In a recent study that examined linguistic fluency through repeatedly exposing participants to music lyrics, a higher number of chorus repetitions as well as a higher Hirsch–Popescu point (indicative of repetitive words) predicted top-rated songs and songs achieving top status more quickly (Nunes, Ordanini, & Valsesia, 2015). However, songs with too many repetitive words were less successful (Nunes et al., 2015). Adding to these findings, by analyzing different genres of popular music separately, Berger and Packard (2018) found that songs in most genres—with the exception of pop and dance—are more popular, as indexed by digital download rates, when their lyrics are more mathematically distinct from the average lyrics in their respective genres (Berger & Packard, 2018).

Most germane to the present research, subjective experiences of processing fluency or disfluency likely inform individual preferences for film, too. Outside of psychology, literary and film critics seem to firmly support the argument that the best movies challenge viewers to broaden their perspectives. Film critics are particularly dismissive of formulaic or predictable movies, often categorically deriding any popular films or movies made by large Hollywood studios as “genre films” (Chandler, 1997, p. 6). Renowned film critic Roger Ebert argued that “the odds against making a truly good movie are discouraging in Hollywood, which uses formulas and deals and habit patterns to push even the most original projects into narrow channels” (Ebert, 2017, p. 83). Consistent with the finding that people like stimuli that are easier to process, films from less familiar genres tend to have lower box office revenues (unless the films included famous actors or actresses and were highly rated by film critics), whereas the box office revenues of films in more familiar genres were not affected by star power or critic ratings (Desai & Basuroy, 2005). People additionally prefer to learn central plot information in advance (or to read “spoilers”) in fantasy/thriller films but not comedies, perhaps because fantasy and thriller genres are expected to be more complex than comedies (Johnson & Rosenbaum, 2018). Finally, people are less likely to favor films with elements that correspond to multiple genres, potentially because films that are difficult to categorize are less easily processed (Hsu, 2006). Such results suggest that whether fluency increases or decreases viewers’ attitudes toward a film may depend on genre—specifically, on whether the genre itself is typically easy or difficult to process.

To summarize, although the positive effects of processing fluency remain robust across domains (Winkielman et al., 2008), some degree of disfluency appears to be helpful or appealing in cases where stimuli benefit from deeper processing (Graf & Landwehr, 2015)—either because they are meant to challenge conventions (e.g., rock or rap lyrics, in Berger & Packard, 2018) or because they require critical thinking (e.g., riddles, in Song & Schwarz, 2008). Therefore, the present study extends previous research concerning processing fluency and art by examining how audience attitudes relate to genre-typicality in language categories identified as being central to the narrative arcs of fictional writing (Blackburn, 2015; Malin et al., 2014) and dramatic films (Nalabandian et al., 2018).

Critic versus audience film preferences

Although disfluency in the arts appears to predict positive evaluations of stimuli, alternative factors could moderate these effects. Individual differences in expertise, motivations, and roles influence the effect of processing fluency on attitudes. At the level of categorization, novices and experts differ in what they consider to be a basic-level category, with novices more commonly using relatively broad superordinate categories (e.g., horror, thriller) and experts more readily using finer-grained subordinate categories (e.g., Hammer horror or giallo; Grey, 2016; Tanaka & Taylor, 1991). Ease of processing—and, by connection, the relation between processing fluency and attitudes—likely differs as a function of expertise. For instance, critics may identify and become bored by narrative formulas in films at faster rates than do audiences, who may be less familiar with a given formula or trope.

Motivation additionally influences individuals’ attitudes, partly via changing expectancies (Freitas, Azizian, Travers, & Berry, 2005). Perhaps novice audiences’ enjoyment of movies will be bolstered by processing fluency not only due to their relative naïveté, but also due to the mindsets (goals and expectancies) they bring with them into a movie-viewing experience. Irrespective of genre, audiences’ and professional critics’ film preferences differ. Audiences favor entertaining blockbuster films that are theoretically easier to process, and critics more frequently prefer complex, artistic films (Austin, 1983; Holbrook, 1999; Simonton, 2009, 2011; Wanderer, 1970). Differences in audience and critic preferences may be explained by their locomotion versus assessment mindsets. A person with a locomotion mindset is focused on the movie-viewing experience itself (watching a movie to enjoy the film), whereas a person with an assessment mindset is focused on evaluating that experience (watching a movie to judge it; Avnet & Higgins, 2003; Finkel, Eastwick, Karney, Reis, & Sprecher, 2012; Gollwitzer & Bayer, 1999). If typical nonexpert audience members tend to have a locomotion mindset, they might prefer entertainment characterized by higher processing fluency (more prototypical, formulaic films). By the same logic, if professional film critics have an assessment mindset, then they would prefer stimuli that would challenge them (more atypical, less formulaic films).

Audiences’ and professional critics’ differing perceptions of movies—theoretically perpetuated by their locomotion (film as entertainment) and assessment (film as art) mindsets, respectively—may be magnified in genres that are typically expected to be more entertaining or less complex. Expectations about whether a film will be entertaining or challenging differ across genres, with action movies stereotypically prioritizing entertainment and tragedies more often aiming to be artistic (Eliashberg, Hui, & Zhang, 2007). Therefore, in addition to predicting that audiences would prefer genre-typicality and critics would prefer genre-atypicality in films, we expected genre to moderate those effects. Specifically, we expected differences between audiences’ and critics’ preferences for genre-typicality or genre-atypicality, respectively, to be stronger for genres that aim to entertain (action/adventure, comedy, romance, family/kids) rather than to challenge (history/war, tragedy, science-fiction/fantasy, thriller/suspense; Eliashberg et al., 2007; Johnson & Rosenbaum, 2018).

Hypotheses

Despite masses of research on the cognitive architecture of categories and how category membership (especially with respect to averageness or typicality) influences attitudes, to our knowledge no research to date has tested how genre-typicality, as indexed by quantitative language patterns, relates to film preferences. Although there are many ways to operationally define films’ prototypicality or averageness within genres, we focused on simple quantitative measures of semantic (the setting, emotions, and action of a narrative) and syntactic (function word usage within the narrative; Altman, 1984) narrative arc typicality within hand-coded basic-level genres. The use of automated text analysis in the behavioral and computer sciences has rapidly gained momentum in the last decade (Herrmann, van Dalen-Oskam, & Schöch, 2015; Iliev, Dehghani, & Sagi, 2015); however, only a few studies have analyzed narrative or fictional samples with the aim of testing psychological hypotheses (e.g., Berger & Packard, 2018; Danescu-Niculescu-Mizil & Lee, 2011; Iliev, Hoover, Dehghani, & Axelrod, 2016). In the present study, we first analyzed dramatic movie scripts using dictionary-based computerized text analysis and then used profile correlations to assess whether genre-typical arcs in several language dimensions from narrative arc theory (categorization, narrative action, and cognitive processing, as well as negative and positive emotional language; Blackburn, 2015; Malin et al., 2014; Nalabandian et al., 2018) differentially predict audience and critic ratings.

In summary, we did not have specific predictions about which language dimensions would show the strongest fluency effects or would be most clearly moderated by rater role; rather, we used Narrative Arc Theory to identify which specific language categories would be most central to individuals’ understandings of filmic narratives. On the basis of processing fluency research and genre theory, we predicted that (H1) critics would prefer atypicality (i.e., the unfamiliar and unexpected) and audiences would prefer typicality (i.e., the familiar and expected), with respect to each genre’s average linguistic narrative arcs in the present sample, and (H2) the associations between genre-typicality and movie ratings would be stronger for more stereotypically formulaic genres, such as action/adventure, romance, comedy, and family/kids films. In other words, critics would prefer less genre-typical language and audiences would prefer more genre-typical language in the aforementioned genres. To test our hypotheses, several linear models (one per genre) regressed ratings of film quality on measures of linguistic genre-typicality, including rater role (audience or critic) as a moderator.

Method

Researchers focusing on the study of film and, in particular, individuals’ perceptions of films have increasingly begun to utilize big-data approaches. Mostly, studies obtain massive samples of descriptive data on films by means of websites such as the Internet Movie Database (IMDb), Rotten Tomatoes, and MovieLens (Desai & Basuroy, 2005; Hsu, 2006; Hwang, Park, Hong, & Kim, 2016; Ramos, Calvão, & Anteneodo, 2015). The present study applies similar big-data practices, albeit on a more manageable scale that allowed for detailed hand-coding of films’ genres, writer characteristics, and ratings. Here we focused primarily on temporal language patterns in scripts (referred to as narrative arcs), as well as audience and critic ratings.

The sample included 509 scripts from the Internet Movie Script Database’s (IMSDb’s) drama category, for films released between 1932 and 2014. IMSDb is a resource that allows one to freely download or view film scripts. The scripts available on IMSDb are not transcribed, edited, or annotated by fans; rather, they represent authentic film scripts written by expert screenwriters. Consistent with gender disparities in Hollywood (see Conor, Gill, & Taylor, 2015), all but a few scripts were written by individual men or all-male writing teams (87.4%), followed by individual women or all-female writing teams (7.1%), and mixed-sex writing teams (5.5%). Analyses processed all text from the scripts, including both screen directions and dialogue, on the basis that key narrative elements—particularly descriptions of characters’ affect or thought processes (e.g., angrily, thoughtful pause)—are often included parenthetically, without being explicitly discussed in dialogue.

We excluded 25 scripts from the analysis, due to some scripts being formatted as images that were not amenable to optical character recognition (n = 3), having fewer than 20 user or critic reviews on Rotten Tomatoes and IMDb (n = 21), or having no dialogue to analyze (n = 1; i.e., The Artist, 2011).

Analytic strategies

Text analysis

The Linguistic Inquiry and Word Count (LIWC; Pennebaker et al., 2015; version 1.3.1) computer program was used to determine the rate of specific word categories for each film script. LIWC measures 125 different grammatical (e.g., pronoun, article), psychological (e.g., positive emotion, power), and topical (e.g., religion, death) language categories containing over 6,500 unique terms. Every language category includes a list of words and word stems, with the goal of representing commonly used words in that category. LIWC compares every word in a given text against those internal dictionaries and then outputs the percentage of the total words in each language category.

In this study, we focused on five specific language dimensions identified as central to narrative in past work on Narrative Arc Theory (Blackburn, 2015; Malin et al., 2014; Nalabandian et al., 2018): categorization words, narrative action words, cognitive-processing words, positive emotion words, and negative emotion words. Categorization and narrative action words are composite variables that incorporate various standardized language categories. The remaining language variables are each representative of one LIWC category. All 509 drama film scripts included in the analyses were split by word count into five equal segments (or acts), in order to assess the trajectory of each language category across the course of each script, as was recommended by both Blackburn (2015) and Malin et al. (2014). Other theories of narrative that focus on the visual elements of film (e.g., shot durations, scene transitions, music) have argued for the existence of four or six arcs in films (depending on whether a prologue and epilogue are included; Cutting, 2016). Because the present research is centered on linguistic analyses of film scripts and proposes to further extend Blackburn’s (2015) and Malin et al.’s (2014) Narrative Arc Theory, in the present study we examined scripts using the five-arc format.

Language categories

Table 1 displays examples of words that make up the aforementioned language categories. The categorization language composite was created by standardizing (z-scoring) and averaging the percentages for articles and prepositions (standardized Cronbach’s α = .74) within scripts. Similarly, the narrative action words composite was also computed by averaging the standardized impersonal pronouns, personal pronouns, adverbs, auxiliary verbs, conjunctions, and negations (α = .84) in all scripts. Cognitive processing words consisted of the insight, causation, discrepancy, tentativeness, certainty, and differentiation language categories. Finally, positive and negative emotion words were standardized as distinct variables.

Film ratings

Film ratings were collected from Rotten Tomatoes as well as IMDb. We chose those sites over other options, such as Metacritic or Fandango, due to their simpler and more transparent rating algorithms. For instance, the ratings on Fandango appear to attribute at least three stars out of five for most films, perhaps to increase ticket revenue (Hickey, 2015). Metacritic weights critic ratings in nontransparent ways, primarily to give more influence to better-known critics. In contrast, IMDb utilizes weighted averages for audience ratings in order to limit errors (e.g., repeated ratings by the same user or extreme outliers). Rotten Tomatoes forgoes weighting and instead designates a score (ranging from 1 to 5 for audience ratings and 1 to 10 for critic ratings) to each review, reporting the proportion of total reviews that are positive—or “fresh”—as the percent rating (see https://www.rottentomatoes.com/about/).

Critic ratings

Critic ratings were obtained from Rotten Tomatoes. Critics ranged from those associated with high-impact online and print news sources, such as The Washington Post and Rolling Stone, to more specialized movie review and entertainment news sites, such as Emanuel Levy or Common Sense Media. The percentage of positive critic ratings as well as the critic ratings out of ten were both standardized and averaged into a composite variable for critic ratings (α = .98).

Audience ratings

Audience ratings were also obtained from Rotten Tomatoes and IMDb. The ratings from Rotten Tomatoes (percentage of positive ratings and ratings out of five) and IMDb (a weighted average) were each standardized and subsequently averaged into a composite variable for audience ratings (α = .95).

Genre

Although the present corpus of films was collected from the IMSDb’s drama category, most of the films could be subcategorized into different genres, as well (e.g., Knocked Up as a comedy). Therefore, the sample of film scripts were coded into several subgenre categories by two research assistants who conducted in-depth searches on Rotten Tomatoes and IMDb for each film. Because both Rotten Tomatoes and IMDb identify multiple genres for films, the raters determined one genre—from the various genres specified on Rotten Tomatoes and IMDb—that best represented each film in the present study’s sample. Subgenres for the majority of films were agreed upon by the two raters. A third rater acted as the tie-breaker for films in which the initial two raters reported discrepant subgenres. From the three raters’ efforts, the films were all coded into eight subgenres: comedy (n = 73), romance (n = 70), thriller/suspense (n = 134), action/adventure (n = 77), science-fiction/fantasy (n = 58), history/war (n = 68), tragedy (n = 18), and family/kids (n = 11). The unequal sample sizes of films in the different subgenre categories are on account of the present sample originally being collected for the purposes of previous research examining the linguistic arcs of narrative in film (Nalabandian et al., 2018). In the present study, we attempted to extend Nalabandian et al.’s (2018) work concerning which linguistic narrative arcs are preferred by audiences versus professional critics.

To facilitate the process of identifying one genre—from the multiple genres exhibited on Rotten Tomatoes and IMDB—for every film, raters were asked to select which subgenre was the most predominant descriptor of each film based on the following criteria illustrated by professional film critic Tim Dirks (2018a, 2018b, 2018c): (1) Films with a main storyline involving love and breakups were identified as romances, (2) films intended to be funny were identified as comedies, (3) films with fictional characters in past wars or other real historical settings (i.e., set more than a decade prior to the film’s release date) were identified as historical fiction, (4) films featuring imagined future technology, magical/super powers, or mythical creatures were identified as science fiction or fantasy, (5) fast-paced films with violence—usually with car chases and lone or semi-lone heroes triumphing over a large number of bad guys—were identified as action and adventure, (6) films with mysteries unresolved until the end of the film, with surprise twists, or with an eerie atmosphere were identified as thriller and suspense, (7) films in which predominantly bad things happen to good characters or heroes, with little or no redemption at the end—often involving the deaths of most or all main characters—were identified as tragedies, and (8) films aimed at entertaining children were categorized as family and kids films.

Genre-typical language

Genre-typicality was operationalized as the profile correlations calculated for each of the five narrative-arc-related language categories (categorization, narrative action, cognitive processing, negative emotion, and positive emotion) between each film’s trajectory over the course of the five segments (or acts) and the average values for each segment within each genre. That is, after creating the composite language variables for categorization and narrative action, we calculated individual category-level profile correlations across the five acts of to films between each movie and the mean for that category in a movie’s genre. Those category-level profile correlations were then transformed using Fisher’s z. The Fisher (1921) z-transformation is used to convert correlations (which are bounded by – 1 and 1) to a continuous scale and to stabilize the variance. The resulting number reflects whether the arc—or trajectory over the course of a film’s five acts—for a given film’s composite narrative language category matches its genre’s average or typical arc for that same category.

Analytic strategy

To test whether audiences and critics differed in their preferences for genre-typical language patterns, we regressed the film ratings on Fisher’s z-transformed profile correlations and rater role (audience or critic) separately for each genre: Ratings ~ Rater × Genre-Typicality, where Ratings serves as the continuous variable referring to the standardized film ratings, Rater represents the categorical variable distinguishing the continuous Ratings variable as audience or critic ratings, and Genre-Typicality signifies Fisher’s z-transformed profile correlations between films and their genre’s typical pattern for the specific language category. All analyses were conducted in R (version 3.4.3; R Core Team, 2017), and mixed-effects models were executed with the lme4 package (version 1; Bates, Maechler, Bolker, & Walker, 2015).

Results

The interaction of rater role and language—testing the prediction that language typicality and film ratings would differ between audiences and critics—was significant only for genre-typical positive emotion language (Table 2). However, simple slope analyses were nonsignificant for both critic (B = – .05, SE = .04, t(507) = – 1.03, p = .304, 95% CI [– .13, .04]) and audience (B = .04, SE = .04, t(507) = 0.81, p = .417, 95% CI [– .05, .12]) ratings. Subsequent analyses testing the hypothesis that genre would play a role in the audience and critic ratings were significant only for genre-typical positive emotion language in action/adventure films (B = – .25, SE = .08, t(75) = – 3.22, p = .002, 95% CI [– .41, – .10], all other ps> .10 see Table 3). Simple slope analyses revealed that critics, but not audiences (B = .01, SE = .11, t(75) = .07, p = .948, 95% CI [– .22, .24]), rated action/adventure films with less genre-typical positive emotion trajectories more highly (B = – .25, SE = .12, t(75) = – 2.14, p = .035, 95% CI [– .48, – .02], see Fig. 1). Such findings are consistent with the prediction that critics prefer less genre-typicality in films, and particularly for formulaic films such as action/adventure films.

Table 2 Rater role-by-genre interaction effects predicting film ratings
Table 3 Rater role-by-genre interaction effects predicting film ratings, subset by genre
Fig. 1
figure 1

Action/adventure movie ratings as a function of rater role and genre-typical positive emotion language

Because our predictions regarding divergent preferences between audiences and critics for genre-typicality were corroborated solely for positive emotion language—not for categorization, narrative action, cognitive processing, or negative emotion language—we conducted additional analyses examining the main effects for genre-typicality independent of rating category (Table 4). Controlling for rater role, more genre-typical categorical language predicted significantly lower ratings of action/adventure films (B = – .33, SE = .11, t(75) = – 3.12, p = .003, 95% CI [– .54, – .12]) and marginally lower ratings of romances (B = – .22, SE = .11, t(68) = – 1.95, p = .055, 95% CI [– .45, .01]; see Fig. 2). Similarly, genre-typical positive emotion language predicted lower ratings of history/war films (B = – .24, SE = .10, t(66) = – 2.32, p = .023, 95% CI [– .45, – .03]; see Fig. 3). Genre-typical narrative action language predicted marginally lower ratings for both action/adventure (B = – .19, SE = .10, t(75) = – 1.89, p = .063, 95% CI [– .40, .01]) and history/war films. (B = – .16, SE = .09, t(66) = – 1.79, p = .078, 95% CI [– .34, .02], see Fig. 4). In contrast, both genre-typical positive emotion (B = .60, SE = .25, t(9) = 2.37, p = .042, 95% CI [.03, 1.17], see Fig. 3) and negative emotion language (B = .64, SE = .27, t(9) = 2.34, p = .044, 95% CI [.02, 1.26], see Fig. 5) predicted higher ratings of family/kids films.

Table 4 Main effects of genre-typicality on film ratings after controlling for rater role, subset by genre
Fig. 2
figure 2

Main effects of genre-typical categorization language for action/adventure and romance films

Fig. 3
figure 3

Main effects of genre-typical positive emotion language for family/kids and history/war films

Fig. 4
figure 4

Main effects of genre-typical narrative action language for action/adventure and history/war films

Fig. 5
figure 5

Main effect of genre-typical negative emotion language for family/kids films

The main effects for linguistic genre-typicality suggest that, regardless of whether the viewer is an audience member or a professional critic, most people prefer less genre-typical categorization words, narrative action words, and positive emotion words in action/adventure, romance, and history/war films. These results are inconsistent with the prediction that audiences would prefer genre-typicality within films in general. However, ratings of family/kids films suggest that both positive and negative emotion language are preferred to be more genre-typical. Violating expectancies by deviating from narrative formulas within this genre may be upsetting for children only beginning to develop how they view themselves and the world around them. However, given that there were only 11 family/kids films in our sample, any results for that genre should be treated with caution until the same pattern is replicated in larger and more representative samples.

Discussion

In the present research, we examined how genre-typical language—specifically, narrative arc language categories implicated in novels, short stories (Blackburn, 2015; Malin et al., 2014), and film (Nalabandian et al., 2018)—predicted film ratings. Partly consistent with our predictions, professional critics preferred less genre-typical positive emotion trajectories in action/adventure films. This finding coincides with research suggesting that critics may be drawn to more complex and artistic films (Austin, 1983; Holbrook, 1999; Simonton, 2009, 2011; Wanderer, 1970) that are likely to challenge the viewer with less typical or formulaic stimuli. Affectively challenging films elicit aversive or inconsistent emotions, whereas positive emotion in film is less challenging to process (Eden, Johnson, & Hartmann, 2018), perhaps explaining critics’ preference for less genre-typical positive emotion in film, particularly in action/adventure films known to follow a narrative formula. In other words, genres—or schematic information based on past experiences—guide the viewer in what to expect from a film, but when such information is inconsistent with one’s schema, expectancies are subverted, and the viewer must work to further process the information. Being film experts, critics may also be more aware of—and thus perhaps more bored by—typical positive emotion trajectories within film genres, relative to more naïve audience members (Belke et al., 2010; Oppenheimer, 2008). Nevertheless, differences between audience and professional critic film preferences only yielded significant results for positive emotion language, leading us to suspect audience and critic preferences differ less than past research has suggested.

Exploratory analyses controlling for film rater role (audience vs. critic) revealed several significant main effects for genre-typical language. Contrary to our hypotheses and prior research indicating that individuals prefer stimuli that can be processed more fluently (Winkielman et al., 2008), both audiences and professional critics preferred less genre-typical (and theoretically harder-to-process) language patterns in most genres. Less typical categorical, narrative action, and positive emotion language patterns were preferred in romance, action/adventure, and history/war films, all genres that are sometimes criticized for being overly formulaic. Although we did not predict that genre-typicality would be more appealing for any one genre or set of genres, only family/kids films received higher ratings while adhering to genre-typical emotional tone. Since children are the intended audience of family/kids films, facilitating understanding and comprehension of the filmic narrative via genre-typical language use may be particularly important for this genre. Children may not have yet developed the ability to appreciate the challenge of interpreting more disfluent stimuli, which is partly supported by results demonstrating second graders’ difficulty—and fifth graders’ ease—in processing smaller font sizes (Katzir, Hershko, & Halamish, 2013).

Although much of the processing fluency literature supports the conclusion that people prefer stimuli that are easy to follow, understand, and comprehend, recent research (e.g., Belke et al., 2015) suggests that such findings may not be generalizable to art, which is often celebrated for being novel or iconoclastic. Art that is more difficult to process allows the individual to encounter an unexpected, novel stimulus that elicits curiosity and provides an opportunity to learn from and attribute meaning to the aesthetic experience (Belke et al., 2015). Psychological research on literature and film presents a more complex picture, however, suggesting that audiences have traditionally preferred narrative structures that are easy to process (Cutting, 2016; Thompson, 1999) but are progressively gaining appreciation for complex, more artistic films that deviate from popular norms (Mittell, 2006). For example, television shows are increasingly abandoning stand alone episodes that require no previous knowledge of the show in favor of adding to season- or series-spanning narrative arcs with every episode—a change that is partly attributable to technological advances in digital recording and streaming that allow viewers to rewatch and minutely dissect complex scenes (similar to rereading a complex passage in a novel; Mittell, 2006). Perhaps critics’ and audiences’ shared preferences for less typical narrative arcs are related to these technology-driven cultural shifts in how people view films.

Implications of computerized text analysis

The present study represents a broad, theory-based snapshot of how genre-typicality relates to audiences’ and critics’ attitudes toward films. Although our results are informative and contribute to the literature on attitudes and processing fluency in art, there is clearly much work to be done on finer-grained—or more holistic—aspects of genre-typicality. For example, Berger and Packard (2018) took a more data-driven approach to analyzing creative work, operationalizing lyrical genre-typicality as the strength of the correlation between songs and genre means across several latent Dirichlet-allocation-derived topics (Blei, Ng, & Jordan, 2003) and LIWC categories. At the other end of the spectrum, it may be interesting to explore how the trajectories of single, psychologically telling words—such as subjective (I) versus objective (me) first-person singular pronouns—relate to audience enjoyment (Zimmermann, Brockmeyer, Hunn, Schauenburg, & Wolf, 2017). Bringing this paradigm into the laboratory could be informative, as well, in terms of assessing how personality, intelligence, demographics, or other individual differences relate to individuals’ enjoyment of typical or atypical narrative arcs. Even our analysis of genre itself was relatively simple and can potentially be bolstered by more data-driven methods of determining which genre or genres a movie belongs to.

A broad aim of this project was to spark other behavioral scientists’ interest in using naturalistic data and computational linguistic tools to study psychological phenomena. Movies are an especially valuable, and often overlooked, source of real-life archival data. Not everyone reads print publications, but nearly every person alive today—of all ages, and across developed and developing nations—regularly consumes television or movies of some kind, with an estimated 119.6 million homes in the United States alone currently having televisions (Nielsen, 2017). Since film is a highly profitable and globalized industry, copious and well-curated records for film expenditures, theatrical releases, and sales are often freely available to the public (e.g., from Box Office Mojo, http://www.boxofficemojo.com, operated by IMDb).

The sanctity of naturalistic behavioral data in the social and behavioral sciences cannot be exaggerated. The field of social psychology, in particular, rose out of researchers’ desire to understand how common people, without the excuse of psychopathy or other mitigating factors, could contribute to the unthinkable horrors of the early 20th century, including the World Wars, the Holocaust, and the Armenian genocide (Perry, 2018; Russell, 2011). Theories that work well in the laboratory are worth very little if they do not help predict, understand, and, when necessary, influence real-life human behavior (Allport, 1919). Despite that widely accepted tenet, most research on psychological phenomena today relies on laboratory or online surveys and experiments, with an occasional field study to validate results demonstrated dozens of times in tightly controlled lab settings. Although several counterexamples to these generalizations have generated considerable excitement (e.g., Brown, Blake, & Sherman, 2017; Mehl, 2017; Pennebaker et al., 2014; Rocklage & Fazio, 2015; Youyou, Kosinski, & Stillwell, 2015), studying psychological phenomena in the real world remains relatively rare in psychology. Analyzing natural language use—particularly fictional language use—is much rarer in psychology departments, despite fictional texts’ psychometric value as a source of abundant quantifiable behavioral data. In this respect, researchers from linguistics (e.g., Davies, 2008) and computer science (e.g., Danescu-Niculescu-Mizil & Lee, 2011; Pakhomov, Chacon, Wicklund, & Gundel, 2011) have been more adventurous, although computational linguists are traditionally trained to prioritize prediction and classification over psychological insights (for counterexamples, see computational social scientists; e.g., Boghrati, Hoover, Johnson, Garten, & Dehghani, 2018; Lazer et al., 2009; Park et al., 2015).

Conclusion

The present study provides evidence for the utility of a novel method of measuring genre-typicality within narrative arcs using dictionary-based text analysis. The computerized text analysis and profile correlation methods used in the study are face-valid, transparent, and relevant to theories spanning a number of fields, ranging from social and cognitive psychology to literary and film criticism; as a result, the methods developed in this project have the potential for wide use across diverse behavioral science fields. A more abstract aim of this work is to encourage researchers in psychology and related fields to experiment with similar methods (dictionary-based computerized text analysis, particularly of fictional language use) in their own research. Through the computerized analysis of fictional dialogue, researchers interested in perspective-taking can explore how various groups of authors have imagined the minds of people from different social and demographic groups (Ireland, Davis, Schumacher, & Pennebaker, 2018). Cognitive and affective scientists can explore how fiction relates to empathy and social simulation (Mar & Oatley, 2008). Experts on culture and sociology can test how art drives and reflects social change (Michel et al., 2011). Creativity researchers can study the semantic signature of creative or atypical narratives (Gray, 2018). People who care about attitudes and personality can assess what language topics or narrative arcs are most appealing to various groups of people, as we aimed to do here. The questions that can be explored by computerized analyses of fictional text samples are virtually unlimited (see Pennebaker & Ireland, 2011).

Dictionary methods—relying on top-down word lists that were handmade and validated by psychologists—are blunt instruments that only indicate how broad topics or patterns of behavior relate to the outcomes we care about. However, because dictionary methods are simple and face valid, they have a low entry point—it costs little to experiment with them, and the potential gains are significant. By incorporating these methods in existing areas of research, it is possible to quickly and rigorously add quantitative behavioral data. In combination with self-report data and experimental outcomes, dictionary-based analyses of verbal behavior have the potential to help triangulate the psychological mechanisms at the core of many behavioral phenomena that are of interest to researchers across the social, behavioral, and computer sciences.