Meaning above (and in) the head: Combinatorial visual morphology from comics and emoji

Cohn, Neil; Foulsham, Tom

doi:10.3758/s13421-022-01294-2

Meaning above (and in) the head: Combinatorial visual morphology from comics and emoji

Open access
Published: 02 March 2022

Volume 50, pages 1381–1398, (2022)
Cite this article

Download PDF

You have full access to this open access article

Memory & Cognition Aims and scope Submit manuscript

Meaning above (and in) the head: Combinatorial visual morphology from comics and emoji

Download PDF

3244 Accesses
6 Citations
20 Altmetric
2 Mentions
Explore all metrics

Abstract

Compositionality is a primary feature of language, but graphics can also create combinatorial meaning, like with items above faces (e.g., lightbulbs to mean inspiration). We posit that these “upfixes” (i.e., upwards affixes) involve a productive schema enabling both stored and novel face–upfix dyads. In two experiments, participants viewed either conventional (e.g., lightbulb) or unconventional (e.g., clover-leaves) upfixes with faces which either matched (e.g., lightbulb/smile) or mismatched (e.g., lightbulb/frown). In Experiment 1, matching dyads sponsored higher comprehensibility ratings and faster response times, modulated by conventionality. In Experiment 2, event-related brain potentials (ERPs) revealed conventional upfixes, regardless of matching, evoked larger N250s, indicating perceptual expertise, but mismatching and unconventional dyads elicited larger semantic processing costs (N400) than conventional-matching dyads. Yet mismatches evoked a late negativity, suggesting congruent novel dyads remained construable compared with violations. These results support that combinatorial graphics involve a constrained productive schema, similar to the lexicon of language.

Mixing positive and negative valence: Affective-semantic integration of bivalent words

Article Open access 05 August 2016

What's your neural function, visual narrative conjunction? Grammar, meaning, and fluency in sequential image processing

Article Open access 24 May 2017

From words to phrases: neural basis of social event semantic composition

Article Open access 20 February 2022

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

General introduction

The compositionality of language has often been taken as a hallmark of linguistic structure, yet abstract, combinatorial meaning-making also appears in other modalities, like visuals. For example, hearts may float above a head or substitute for eyes to show lust, or a lightbulb above a head may indicate inspiration. While this type of “visual morphology” is stereotypical of comics (Cohn, 2013, 2018; Forceville, 2011, 2016; McCloud, 1993), it has proliferated across emoji and filters for photographs, where such morphemes can be applied to human bodies. These combinations require a comprehender to link modifying information (like hearts, lightbulbs) to a more stable form (like a characters’ head) to derive a meaning beyond their parts. Because of these relationships, these combinations have been compared with affixes in the morphology of language (Cohn, 2013, 2018; Engelhardt, 2002; Forceville, 2011), making them “lexical items” in “visual languages” of graphics which may vary across cultures (Cohn, 2013; Cohn & Ehly, 2016; Forceville et al., 2010; Tasić & Stamenković, 2018). Because these forms integrate different sources of meaning in noniconic ways, and are growing in ubiquity throughout society, they provide a way to investigate combinatorial meaning-making outside the domain of spoken or signed languages.

Theories of visual affixation

Visual Language Theory (Cohn, 2013) posits that drawn information uses structural and cognitive principles similar to that of language. This is particularly salient in the combinatorial properties of visual morphology. In contrast to visual forms that can easily stand alone, some visual representations must attach to other forms. A speech balloon must connect to a speaker, while a motion line depicting a path must attach to a moving object. These forms cannot retain their meaning when free floating unconnected to another visual element. Because of this dependent nature, these forms have been likened to bound morphemes in language, which must affix to a more primary stem (Cohn, 2013, 2018; Engelhardt, 2002; Forceville, 2011). Thus, a speech balloon is a visual affix which attaches to the stem of a speaker. This dependent nature may also be hierarchical. Consider the lightbulb that floats above characters’ heads to convey inspiration. Typically, the lightbulb also has radial lines that emanate from it to depict its brightness. Here, the radial lines affix to the lightbulb, and this composite then affixes to the stem of a person. Thus, visual affixes appear to attach in a hierarchical structure (Cohn, 2018).

In addition, like in linguistic systems, visual morphology uses various physical strategies to combine bound morphemes with their stems (Cohn, 2013, 2018; Forceville, 2019). Affixes can physically attach to stems through juxtaposition, such as lightbulbs above the head, speech balloons, or motion lines. They can also substitute for parts of a stem, as in linguistic suppletion, such as replacing eyes with hearts, or dotted lines replacing solid lines to show invisibility. Finally, parts of an image may be repeated, similar to reduplication in language, such as repeating a body part multiple times to show that it is moving. These basic strategies of attachment, substitution, and repetition reflect the variety of possibilities for forms to connect with each other, and thereby are abstractly similar across verbal and graphic modalities.

Visual morphemes may also vary in the way they convey meaning. Some have stable meanings (the heart shape, speech balloons, motion lines) while some in context change from iconic to symbolic meaning (lightbulbs for inspiration, gears for thinking), possibly invoking conceptual metaphors that draw on the iconicity to convey such symbolicity (Cohn, 2018; Forceville, 2011, 2016; Szawerna, 2017). In such cases, an emergent, construed meaning arises that is greater than the sum of the parts: Nothing about a happy face or a lightbulb alone convey inspiration, but together they create this emergent meaning (Cohn, 2018). These conventionalized meanings are recognized easily by most viewers (Cohn et al., 2016). It should be noted that this again aligns with linguistic morphology, such as the construal of meaning between words within compounds, which runs the gamut of conceptual relationships (Jackendoff, 2009).

While Visual Language Theory hypothesizes a relationship between the structure and cognition of language and visual representations, such relations exist at an abstract level. For example, it is not that “affixation” in visual representation “is like” or “is parallel to” that of linguistic morphology, but rather that basic strategies of cognition operate in both the verbal and visual forms in similar ways. Whether both modalities recruit overlapping domain-general mechanisms is an empirical question. This issue is explored for visual morphology in the current research (Experiment 2) but is also suggested by prior work showing similar neural responses to the processing of visual narrative sequences as to sentences (Cohn, 2020b), and that verbal information can be modulated by a visual sequential context, and vice versa (Federmeier & Kutas, 2001; Ganis et al., 1996; Manfredi et al., 2017; Weissman & Tanner, 2018).

Visual affixation and memory

Given that visual affixation involves the construction of meaning out of disparate parts, it begs the question of how this information is encoded and processed. Theories of processing visual morphology also take on the character of linguistic theories (Leminen et al., 2019). One possibility is that combinatorial visual morphemes are stored in memory on an item-by-item basis. This type of full-entry theory of lexical storage as applied to words would imply that derivations of different words—both regular and irregular—are independently stored as a whole in the lexicon (Jackendoff & Audring, 2020). Applied to visual morphemes, individual affixes would then become encoded in long-term memory individually, with a “lexicalized” meaning that is unique and conventionalized (Kennedy, 1982; McCloud, 1993; Walker, 1980). Evidence for such encoding comes from findings that the frequency of conventionalized morphology, and familiarity with them, influences their comprehension (Cohn et al., 2016; Nakazawa, 2016; Newton, 1985).

A full-entry theory would posit that understanding of visual morphemes involves the retrieval of specific stored meanings from memory, with little interaction from the stems themselves (Feng & O’Halloran, 2012; Ojha, 2013). In other words, placing squiggly lines above a head should evoke the meaning of anger (Fig. 1a), regardless of the facial expression of the stem. Thus, novel affixes, like those in Fig. 1b, would be construable as an independent meaning if they used conventionalized representations (such as peace signs), but others might be viewed as incongruous or meaningless as upfixes until they become encoded in memory.

An alternative perspective views that, though visual lexical items may be entrenched in memory, their compositional meaning is dynamically construed given their context (Bateman & Wildfeuer, 2014). Such a dynamic-construal theory renders visual morphology equal across different degrees of conventionality, and across their combinatorial interaction with other elements (like faces). Here, whether they are conventionalized or not, meanings of visual morphemes would be computed online with reference to their discourse context. Indeed, context allows many visual morphemes to be understood beyond their item-based encoding. Some affixes change in meaning based on their position (Cohn, 2013, 2018; Forceville, 2011; McCloud, 1993), such as stars above the head to mean dizziness, stars replacing eyes to mean a desire for fame, and stars adjacent to a body part to mean pain. Similarly, three “spiking” lines might mean surprise (third row in Fig. 1a), but next to a gold bar would mean shininess. Across these cases, the same visual representations change meaning depending on their context, and a dynamic-construal theory would thus extend this notion to all visual morpheme contexts.

Nevertheless, variation by position could remain conventionalized by item, and indeed the variable interpretations of stars (above head vs. in eyes) rely on conventionalized meanings for those different positions. If dynamic construal alone guides comprehension, it implies that all combinations are somehow construable, and none are truly incongruous. Such an extreme view has not been supported by experimentation where participants both explicitly and implicitly recognize some visual morphological combinations as more felicitous and meaningful than others (Cohn et al., 2016; Cohn & Maher, 2015; Ojha, 2013). Nevertheless, emergent meanings do arise from unconventional combinations between visual affixes and stems, implying that some construal process is at work (Cohn et al., 2016).

Within Visual Language Theory, we have proposed that some forms of visual morphology can become abstracted into productive classes (Cohn, 2013, 2018). In this lexical-schema theory, some visual elements may be lexicalized as instances, while belonging to a generalized template of a “lexical schema” that allows for novel combinations. This conception of a visual lexicon is consistent with constructional theories of morphology and the lexicon in language (Booij, 2010; Jackendoff & Audring, 2020), where productive lexical items are stored as declarative schemas with variables that are filled by specific instances. This architecture allows for lexical items (e.g., words) to be stored as memorized instances (awareness, happiness) but also to allow for abstraction of novel forms (comic-ness, emoji-ness) across the generalized structure (X-ness).

For bound visual morphemes, this schematic nature is exemplified by elements that float above characters’ heads, such as lightbulbs, hearts, gears, circling stars, and many others, as in Fig. 1a. These forms are argued to comprise a class of visual affixes placed above characters’ heads, and because these affixes are “up” from their stem, we call them upfixes (Cohn, 2013, 2018). These cases thus are posited to involve a schema with open variables for the upfix above a character’s head, and also a variable for the facial expression of the character. This overall relationship facilitates construal of upfixes to typically have meanings related to the cognitive or emotional state of the character.

This schema is further hypothesized to specify constraints on the position and relationship of an upfix to its stem. First, upfixes are constrained to a location above the head, and may appear incongruous if moved too far from an upward position. Second, the facial expression of the stem must “agree” with the upfix: A storm cloud upfix should appear awkward above a happy face, while a lightbulb should seem awkward above a confused face (Cohn, 2013). This combination allows upfixes and faces to work together in creating a singular construal, clarifying facial emotions that may be ambiguous on their own (Stamenković et al., 2018). Because upfixes are argued to involve a productive lexical schema, novel forms floating above characters’ heads could “fill the slots” of the schema that remain interpretable as some form of cognitive or emotional state. As such, novel forms should also adhere to these constraints of position and face–upfix agreement.

In prior research (Cohn et al., 2016), we examined the idea that upfixes formed a productive class governed by constraints on the relationship between the face and upfix (whether they matched or mismatched, as in Fig. 2) and their relative positions (above or beside the head). Upfixes that were beside the head were rated as less comprehensible than those above the head, while those mismatching the facial expression were rated even lower. These constraints also interacted, as mismatching faces that were also beside the head received the lowest comprehensibility ratings of all. These results suggested that upfixes are indeed constrained by both the spatial location relative to a face, and its facial expression. However, these same constraints appeared to operate on both conventional and unconventional upfixes. Though novel, unconventional upfixes were rated as less comprehensible than conventional ones, spatial location and facial mismatch again modulated these assessments.

This provided evidence that upfixes use a productive schema, and not simply full-entry memorized items, because these constraints operated on both established upfixes and novel ones. It also provided evidence against a full dynamic-construal theory (Bateman & Wildfeuer, 2014), since unconventional and/or mismatching dyads were less comprehensible than conventional ones. Moreover, participants overtly commented on the lack of ability to combine mismatches into a coherent, holistic interpretation, going against the idea that they would be dynamically construed.

Subsequent research examined response times and eye-movements to matching and mismatching face–upfix dyads that compared whether the face and/or upfix was in a cartoony or photorealistic style (Kendall et al., 2020). Here, participants’ response times did not differ based on face–upfix matching for recognizing the overall emotion of the dyad, which was also not modulated by the style of the face or upfix (cartoony or photorealistic). Participants fixated on faces prior to upfixes but spent more time dwelling and fixating on upfixes than faces as long as the face was cartoony. Analyses using more graded emotional valence of dyads found longer response times when the emotions of face–upfix dyads were more ambiguous than when they were more clearly matching or mismatching (Kendall, 2019).

While this research provided initial evidence for the schematic nature of upfixes as a productive class of visual morphology, the behavioral evidence for the matching constraint (response times) was mixed and focused on conventional dyads. Stronger evidence for an upfix schema would come from contrasting how these constraints operate across the conventionality of dyads. We therefore conducted two studies examining the constraints of conventionality and face–upfix matching to better assess the combinatorial structure of these visual morphemes. In both experiments we presented participants with the component parts of the dyad (face, upfix) before the composite face–upfix dyad, measuring the response times for participants’ judgements of their comprehensibility (Experiment 1) and their electrophysiological brain responses using event-related potentials (Experiment 2). This work thus aimed to further assess the processing of face–upfix relationships.

Experiment 1: Viewing times

Only a few studies have yet analyzed face and upfix relationships with behavioral measures. Prior work found that response times to recognize the congruency of face–upfix dyads did not differ between matching and mismatching dyads (Kendall et al., 2020), and these times did not differ depending on whether the face and/or upfix was in a cartoony or photorealistic style. Kendall (2019) further examined dyads with incremental changes in emotion: faces shifted from happy to neutral to sad expressions, while upfixes shifted from a sun to sun-with-clouds, and then only rain clouds. These graded upfixes were paired with faces so that they were gradually matching (happy-sun to sad-rain) and mismatching (happy-rain to sad-sun). Participants recognized the congruency of dyads along this gradient, but their response times to these judgements were slowest to medial mismatching (e.g., semi-happy face with semi-stormy upfixes) compared with relatively faster times to fully matching and mismatching dyads. These findings were thus mixed for how mismatching affects response times, but do not address how they operate in fully unconventional dyads.

In order to address these concerns, our first study presented participants with face–upfix dyads where the visual morphological elements (face/upfix) were presented one at a time prior to the compositional whole. We manipulate the (mis)match between face and affix across both conventional and unconventional dyads, as in Fig. 2. These stimuli were presented using a self-paced viewing task where participants had to assess how well they understood the meaning of the images at each screen, and we measured their response times to these judgments of comprehensibility.

If upfixes were stored in memory on an item-by-item basis, conventional upfixes should be better than unconventional ones, with no advantage of matches over mismatches. Conversely, if comprehension is entirely based on construals, no differences should appear across our types: conventional matching face–upfix dyads would receive no advantage to unconventional or mismatching dyads, because they would all involve the same process of construal. These extreme positions are unlikely though, as our prior studies have shown sensitivities to conventionality and mismatching. If processing follows prior judgements (Cohn et al., 2016), to provide evidence of a productive schema, we expect that coherent conventional upfixes would be easier to process than unconventional ones, which should both be easier to process than mismatches, which violate the structure.

Methods

Stimuli

We used the stimuli from our previous study of upfixes (Cohn et al., 2016) consisting of 58 face–upfix dyads, including 28 conventional dyads (Fig. 1a) and 30 unconventional ones (Fig. 1b). Conventionality was confirmed using ratings measured from our prior study (Cohn et al., 2016: Experiment 1) for how familiar participants were with these dyads along a 1 (not familiar) to 7 (very familiar) scale. Conventional dyads had an average score of 5.8 (SD = .86), with unconventional dyads scoring an average of 3.4 (SD = 1.1). For each of these dyads where the face and upfix “matched,” we created mismatching relationships where the facial expression mismatched the upfixes, as in Fig. 2. Mismatches were again confirmed using ratings from our prior study (Cohn et al., 2016). This resulted in 116 total face–upfix dyads crossing conventionality and (mis)matching.

An additional 38 faces with non-upfix morphology were used as fillers to increase the heterogeneity of the stimuli and to prevent participants from anticipating the above-the-head positioning of the affix prior to the dyad. These included both conventional and unconventional face–affix pairs that were substituted for eyes (e.g., hearts or stars for eyes), on the forehead (vertical lines for dread), out the nose (bubble for sleep), or other idiosyncratic signs (like a zipper for a mouth). Variations also used signs that were similar to those used as upfixes but were then placed in a different position. This ensured that viewers would not know how faces and affixes/upfixes would be arranged. Thus, if a person saw a question mark and face in the first two images, they would not know until the final image if it would become an upfix (conventional) or would have the question mark come out of the nose (unconventional). These non-upfix morphemes were also presented in matching and mismatching types to amount to 76 total non-upfix arrangements.

To control for the order of information viewed by a participant, we presented stimuli either as face first, then upfix, then the full dyad, or upfix first, then face, then together. In total, this amounted to 232 upfix stimulus types across conventionality, matching, and order, and 152 non-upfix morphemes. Here, we collapse across these orders and analyze stimuli only at the face–upfix dyads.

Because of the large numbers of items throughout our whole stimulus set, we divided these stimuli into lists such that each participant saw 58 total dyads across our stimulus types. This comprised 29 upfix dyads (seven or eight of each of the four upfix types) and 29 non-upfix dyads (nine or 10 matching or mismatching). While this meant that each participant did not see all morphemes, across all participants, all dyads across all types were viewed in all presentation orders.

Participants

Our participants were 82 individuals (31 females, 51 males, mean age = 34.6 years, SD = 11.2) recruited online through social media. All participants gave their informed consent to participate. Pretest questionnaires assessed participants’ familiarity with reading and drawing various types of comics using the Visual Language Fluency Index (VLFI). VLFI scores have previously been shown to correlate with behavioral and neurocognitive aspects of visual narrative processing, including ratings of upfixes (e.g., Cohn, 2020a; Cohn et al., 2016). Participants’ VLFI scores covered a wide range (1.5–52.5), but on average they were very proficient comic readers, with a mean of 22.4 (SD = 12.3), where below 7 is low and above 20 is high fluency.

Procedure

Participants accessed the experiment through an online survey website (Qualtrics), where they first filled out the VLFI forms and then proceeded to the primary experiment. The self-paced viewing task used the jspsych plugin (de Leeuw, 2015). Each trial began with a screen reading READY. When pressing a button, participants progressed through each of the three screens of the trial (face–upfix dyad or upfix–face dyad). At each screen in the trial, they were instructed to make a yes/no forced choice for whether they understood the meaning of the dyad. Because all trials were fixed at three screens each, this incremental judgement task was done so that participants would not speed through the stages without paying attention. At the end of the experiment, participants were asked for any further comments they may have and were thanked for their participation.

Data analysis

Our analysis focused on participants’ response times and comprehensibility judgments at the combined face–upfix dyad. For each participant, we averaged response times and comprehensibility judgements across individual items in a subjects analysis. For response times, we removed outliers that were below 200 ms or above 2.5 times the standard deviation of the mean. We used a 2 × 2 ANOVA with within-subjects factors of Conventionality (conventional, unconventional) and Matching (match, mismatch) to analyze both our response times and comprehensibility judgements. Post hoc pairwise analyses used a Bonferroni correction.

A follow-up items-analysis of response times examined whether conventionality affected response times in a more graded way across dyads. We used the ratings of each face–upfix dyad’s conventionality (1 = not familiar, 7 = very familiar) from our prior study (Cohn et al., 2016: Experiment 1). Averaging across response times for each participant, we here correlated response times for both matching and mismatching dyads with the average conventionality scores for each dyad from Cohn et al. (2016).

Results

Comprehensibility judgments

We first analyzed participants’ assessment of the comprehensibility of the face–upfix dyads which revealed a main effect of Conventionality, F(1, 81) = 166.2, p < .001, a main effect of Matching, F(1, 81) = 148.8, p < .001, and an interaction between them, F(1, 81) = 34.4, p < .001. These results arose because participants were more likely to rate conventional matches as comprehensible, less likely to rate unconventional mismatches as comprehensible, and with ratings of unconventional matches and conventional mismatches in-between, as depicted in Fig. 3a. All contrasts between dyads were significantly different (all ps < .001), except between the intermediate ratings of unconventional matches and conventional mismatches.

Response times

We next analyzed how long participants took to make their comprehensibility judgements. We found a main effect of Conventionality, F(1, 81) = 9.2, p < .005, a main effect of Matching, F(1, 81) = 48.2, p < .001, and an interaction between them, F(1, 81) = 12.4, p < .001. As depicted in Fig. 3b, these results arose because mismatching face–upfix dyads were responded to slower than matching dyads (all ps < .005), and while mismatching dyads did not differ in their response times based on conventionality (p = 1), participants responded to conventional matching dyads faster than to unconventional matching dyads (p < .001).

We next asked whether conventionality persisted in a more graded way. We correlated the ratings of conventionality for each dyad gathered in a previous study (Cohn et al., 2016) with participants’ response times for both matching and mismatching dyads. As depicted in Fig. 4, a negative correlation suggested that greater conventionality for matching dyads led to shorter responses times, r(55) = −.549, p < .001. However, the correlation between conventionality scores and mismatching dyads was not significant, r(55) = −.107, p = .443.

We found no correlations between VLFI scores with either ratings or response times.

Discussion

This study examined the compositionality of, and constraints on, productive and nonproductive visual morphology. Consistent with prior findings (Cohn et al., 2016), conventional dyads were rated as more comprehensible than unconventional dyads, but within both types, mismatching face–upfix dyads were rated as less comprehensible than matching dyads. Thus, because even novel, unconventional dyads evoke certain constraints on face–upfix agreement, it suggests that these constraints operate across an abstract schema.

Participants’ response times to making these judgements supported these results but displayed different relationships between the dyad types. Here, mismatching face–upfix dyads were responded to slower than matching dyads, yet mismatching dyads did not differ across conventional and unconventional types. This suggested that the mismatching relationship between faces and upfixes drove the increase in response times, beyond conventionality. These results differ from prior work that did not evoke varying response times between face–upfix dyads on the basis of (mis)matching (Kendall, 2019; Kendall et al., 2020). Those prior studies used a smaller range of upfixes and emotional states as stimuli, with an additional contrast between cartoony and photorealistic images, which may have mitigated the congruency effects. Because mismatching incurs a processing cost similarly to both unconventional and conventional upfixes, it implies that upfixes involve a productivity that can be violated regardless of their familiarity.

Though conventionality did not affect response times across mismatching dyads, it did modulate those of matching face–upfix dyads. Here, conventional dyads were responded to faster than unconventional dyads, which were both responded to faster than mismatches. These faster times to more conventional dyads occurred in a graded way, but again only for matching face–upfix dyads. Here, we showed that faster response times occurred on an item-by-item basis based on their conventionality scores (Cohn et al., 2016). This is consistent with previous findings that familiarity with face–upfix dyads increases their comprehension (Cohn et al., 2016; Newton, 1985). The faster times to conventional dyads further support that they have an advantage in processing over both mismatches and unconventional matching relationships. This implies that conventional matching dyads—the standard types—are stored in memory as part of a visual lexicon (Cohn, 2013; Forceville, 2011; Walker, 1980), while their differences with unconventional dyads implies that they are not construed uniformly in the absence of entrenched representations (cf. Bateman & Wildfeuer, 2014).

In sum, the higher ratings and faster response times to conventional dyads compared with unconventional dyads supports some degree of encoding of item-based types in memory. However, because mismatches are rated lower than matches for unconventional face–upfix dyads, and because mismatching unconventional dyads incur a similar processing cost in response times as mismatching conventional dyads, it supports the presence of a schematic template abstracted beyond encoded types.

Experiment 2: Event-related potentials

In Experiment 1, conventionality modulated response times to matching face–upfix dyads but did not influence mismatching dyads. However, response times do not allow us to assess whether the cognitive processes at work during meaning-making are similar across conventionality and (mis)matching relationships. In Experiment 2, we thus measured event-related brain potentials (ERPs) to the same manipulations as in Experiment 1 in order to better assess the neurocognitive processing involved in these interactions. ERPs are a measure of the electrophysiology of the human brain in online processing, time-locked to particular stimulus events (here, the onset of each image).

Given that participants consciously rate unconventional and mismatching face–upfix dyads as less comprehensible (Cohn et al., 2016), we might expect such manipulations to create costs for how they create meaning. Indeed, prior work (Cohn et al., 2016) showed that the conventional face–upfix dyads evoke more stable interpretations for their meaning than mismatches or unconventional dyads. In ERPs, semantic processing is indexed by the “N400,” a negative polarity deflection peaking around 400 ms (Kutas & Federmeier, 2011). Though first discovered in the context of language (Kutas & Hillyard, 1980) and modulated by morphology (Leminen et al., 2019), the N400 has been well attested as an index of semantic processing across domains, including visual scenes, pictures, and visual narratives (Barrett & Rugg, 1990; McPherson & Holcomb, 1999; Sitnikova et al., 2008; Võ & Wolfe, 2013; West & Holcomb, 2002). The N400 is thought to index the access or retrieval of information in semantic memory, regardless of the modality (Kutas & Federmeier, 2011).

Because the N400 is sensitive to incongruities, we would expect that mismatching, and perhaps also unconventional, face–upfix relationships will evoke larger N400s than conventional matching dyads, because only the conventional dyads are encoded in memory. Mismatching faces with upfixes might be likened to irregular or violated morphology of words, which has been observed to elicit larger N400s than regular morphology (for review, see Leminen et al., 2019). For the dimension of conventionality, the N400 has long been shown to be sensitive to frequency effects across modalities. In language, high lexical frequency leads to attenuated N400s compared with low frequency or novel lexical items (Barber et al., 2004; Kretzschmar et al., 2015; Kutas, 1993; Leminen et al., 2019; Van Petten, 1993, 2014), while familiarity or repetition of pictures also attenuates the N400 (Curran & Cleary, 2003; Schendan & Kutas, 2002). Given these findings, similar frequency effects could be observed for the conventionality of upfixes, leading to attenuated N400s for the conventional matching upfixes.

Infrequency might also factor into the N400 for mismatching face–upfix relationships. If the N400 indexes a process of integration or construal beyond the retrieval of memory-based representations, the N400s to mismatches should be greater than to matching unconventional dyads. However, such integrative processes have often been observed to effects subsequent to the N400. Sustained negativities occur for persisting interpretive or inferential processing in both sentence and discourse contexts (Baggio, 2018; Bott, 2010) and in visual narratives (Cohn, 2020b), and strong, sustained negativities have been observed to novel compound words (Fiorentino et al., 2014). If novel upfixes demand an additional construal process than entrenched conventional upfixes, we may thus expect sustained negativities associated with that process.

In addition, later positivities have been associated with the updating or revision of a mental model of a representation (Baggio, 2018; Cohn, 2020b; Donchin & Coles, 1988; Kuperberg, 2016), and such processes could be posited as occurring to the integrative relationship between a face and upfix when they mismatch. Indeed, late positivities were evoked in a study of motion lines in comics—the lines that trail a moving object. When the lines reversed to depict a backwards action/motion in a panel from a visual narrative sequence, they evoked both larger posterior and anterior positivities than regular motion lines (Cohn & Maher, 2015). Reversed motion lines violate the structural expectations of their schematic relationship, and thus we might predict similar positivities to the violations introduced by our face–upfix mismatches. Indeed, similar positivities have been found to structural violations of naturalistic scenes, such as objects being placed in unexpected locations (Võ & Wolfe, 2013).

We also considered the possibility that upfixes might evoke differential ERPs associated with face processing. In a comparison of visual styles of emotional faces (Kendall et al., 2016), cartoony faces evoked an attenuated P1 when compared with photorealistic and rotoscoped faces. This was attributed to the low-level featural differences between cartoony and photorealistic faces, which were posited as feeding forward into the subsequent N170—an ERP component reflecting the neural processing of faces (Bentin et al., 1996) which is also sensitive to emotional expressions (Blau et al., 2007) and is modulated in comparisons of real faces and emoji faces (Gantiva et al., 2020; Weiß et al., 2020).

Subsequent research then compared iconic cartoony faces with versions where a novel symbol was replaced for the mouth, a “suppletion” similar to the more conventional zipper for a mouth (Kendall, 2019). In a first presentation, these symbolic representations evoked a larger and earlier latency N170 compared with the relatively meaningless symbolic-mouth faces. However, participants then underwent a training phase that taught them to recognize these mouth-symbols as indicative of different emotions. Subsequent measurements of ERPs showed that the faces with mouth-symbols then evoked a greater N170 than the iconic faces, which was also greater than the symbolic-mouth faces in the pretraining measurement, though they remained at a later latency. Such results suggest that with a conventionalized understanding, this combinatorial visual morphology becomes processed similarly to iconic faces.

Interestingly, an apparent later deflection in the waveforms of this study remained unanalyzed. Here, the learned symbolic-mouth faces appeared to evoke a greater negativity (i.e., greater negative amplitude effect) than those from before the learning phase, yet both remained less negative than the iconic faces, which did not differ across phases. This negativity occurred between 200 and 300 ms with a peak around 250 ms, consistent with an N250, a posterolateral waveform implicated in both memory and perceptual processes which has been shown to be greater to faces that are more familiar than those that are less familiar (Begleiter et al., 1995; Schweinberger et al., 1995), whether that familiarity was encoded in memory or newly acquired (Tanaka et al., 2006). While primarily shown to faces, it has also been extended to familiarity with a complex visual representation (Scott et al., 2006). Such results have suggested that the N250 may index “perceptual expertise” for the access or formation of stored perceptual representations (Folstein et al., 2017; Schendan & Ganis, 2015). The implication for the Kendall (2019) study is that the reduced N250 arose to these symbolic-mouth faces because they were not familiar to participants, though the training phase enhanced their conventionality leading to a greater N250 (Folstein et al., 2017; Scott et al., 2006).

These results suggest that unconventional visual morphology (like our unconventional upfixes) can evoke early effects associated with visual complexity and familiarity. Nevertheless, these studies by Kendall presented stimuli with a fairly short duration of 300 ms, making it difficult to examine downstream components such as the N400. Here, we therefore aim to further examine the neurocognition of combinatorial visual morphology. In line with the research reviewed above, we predicted that, like Kendall (2019), we would observe a reduced N250 to unconventional relative to conventional dyads. However, we expected that both conventionality and mismatching would lead to larger N400s, as they would require more costs to the access and/or integration of meaning. If both mismatching and conventionality separately contribute to such access, we might expect a graded or additive effect, as in the results of Experiment 1. Finally, unconventionality and mismatching may also influence later effects (late positivities, late negativities), although without prior findings we did not have specific expectations.