Introduction

We rely on semantic memory to understand words, interact with objects, and flexibly assimilate new information. This form of human memory is accordingly essential for navigating our most fundamental interactions with the world. Our empirical understanding of semantic memory has recently undergone radical revision. Biological plausibility has emerged as an essential constraint for models of conceptual representation, which have historically been rooted in philosophy and cognitive linguistics. Although we now enjoy unprecedented empirical power to elucidate the cognitive and neural architecture of semantic memory, a consensus on semantic organization remains paradoxically elusive. Our aims here are to discuss several factors perpetuating theoretical discord and to present our own perspective on two of the most commonly recurring and controversial topics in the study of semantic memory:

  1. (1)

    Embodied vs. disembodied cognition: the extent to which semantic knowledge is grounded by perception, action, and somatic states and the necessity for symbolic transformations of sensorimotor detail.

  2. (2)

    Abstractness: the manner in which the brain represents concepts such as proposition and symbol that are not clearly grounded within perception, action, or somatic states.

How embodied is the semantic system?

Neurologically constrained theories of semantic memory tend to fall along a spectrum defined by their central anatomical organizing principle. Fully distributed models have historically been strongly associated with embodied cognition in that they have no central point(s) of convergence and involve dispersion of perceptual and motor and features across modal association cortices (Allport, 1985; Gage & Hickok, 2005; Meteyard, Rodriguez, Bahrami, Vigliocco, & Cuadrado, 2012; Pulvermüller, Moseley, Egorova, Shebani, & Boulenger, 2014; Pulvermüller, 2013).Footnote 1 In contrast, hub views are more commonly regarded as disembodied in that they propose local semantic binding sites that perform abstract symbolic transformations of sensorimotor knowledge (Lambon Ralph, Sage, Jones, & Mayberry, 2010; Patterson, Nestor, & Rogers, 2007; Rogers et al. 2004). We discuss potential strengths and weakness of these perspectives below.

Fully distributed models

Fully distributed models operate under the assumption that the brain decomposes object concepts into discrete sets of features stored in sensorimotor brain regions (e.g., premotor cortex for action, auditory cortex for environmental sounds) (Gallese & Lakoff, 2005; but see Martin, 2007). Repeated exposure to a correlated set of semantic features facilitates Hebbian learning through which anatomically remote representations become functionally coupled. Under this view, object concepts reflect neural co-activation of features gradually instantiated through feature covariance (e.g., handles and sharp edges often co-occur). This feature-based approach has been widely invoked when modeling patterns of performance within semantic domains (e.g., abstract vs. concrete word recognition differences, semantic categorization) and patient populations (e.g., Alzheimer’s Disease) (Cree, McNorgan, & McRae, 2006; Cree & McRae, 2003; Farah & McClelland, 1991; Gonnerman, Andersen, Devlin, Kempler, & Seidenberg, 1997). For example, one might intuitively imagine how the semantic features of a banana decompose and disperse across relevant association cortices (Crutch & Warrington, 2003; Samson & Pillon, 2003).

The compositional assumption of distributed models has been criticized widely, however, on grounds that semantic features have emergent properties (Jackendoff, 1987). In a linear mathematical system, for example, one can reasonably assume that the input (e.g., 2 + 2) yields a predictable output through simple addition. The classical view of concepts was premised on the assumption that semantic features combine in a linear manner (e.g., yellow + sweet + pleasant odor = BANANA). This assumption has since proven untenable in the face of phenomena such as fuzzy category boundaries, typicality effects, and the resistance of abstract words to conventional binary feature listing approaches (for refutation and alternatives see Murphy, 2002). Thus, it is unclear how an embodied semantic system composed exclusively of distributed sensorimotor regions is capable of performing the nonlinear operations critically necessary for imbuing semantic feature binding with its characteristic emergent properties. Lambon Ralph (2014b) recently employed the metaphor of a recipe describing this paradox, arguing that the mere presence of flour, butter, vanilla, and sugar do not ensure the presence of a cake. Similarly, the representation of concepts requires that the semantic system perform combinatorial, operations upon constituent features: sensorimotor information alone is incapable of fully representing conceptual information.

Abstract concepts such as proposition and symbol pose another problem for fully distributed semantic theories: how could such concepts be tied to sensorimotor information? One prominent solution, Dual Coding Theory, holds that language and percepts constitute two parallel semantic systems: abstract concepts are exclusively verbally coded through linguistic associations, whereas concrete concepts share dual linguistic and perceptual codes (Paivio, 2013). A more radical view essentially denies that abstract concepts exist at all and that all words are ultimately grounded in somatic states linked to perception, emotion, and introspection (for variants of grounding in abstract words see Barsalou, 2009; Borghi, Capirci, Gianfreda, & Volterra, 2014; Gallese & Lakoff, 2005; Kousta, Vigliocco, Vinson, Andrews, & Del Campo, 2011; Vigliocco et al. 2014).

Finally, patient-based dissociations present a final challenge for fully distributed models. A distributed semantic network affords great redundancy and resilience to brain injury. This organizing principle predicts that only the most catastrophic bilateral brain injuries should produce global semantic impairments. Yet, this is clearly not the case. Warrington’s (1975) foundational case series first detailed the selective impairment of semantic memory in what is now known as semantic dementia or semantic variant primary progressive aphasia (svPPA). Many subsequent investigations into the nature of the linguistic and conceptual impairments incurred in semantic dementia have generally demonstrated a profile of a multimodal semantic impairment linked to bilateral cerebral atrophy, impacting a relatively circumscribed portion of the temporal lobes (Acosta-Cabronero et al. 2011). The combination of pathology and impairment incurred in semantic dementia suggest the presence of one or more semantic nexus points. This network principle is antithetical to fully distributed theories but central to the amodal hub approach, to which we now turn.

Amodal hub models

Proponents of amodal semantic theories argue that concepts undergo complex transformations from high fidelity sensorimotor to symbolic representational formats (Fairhall & Caramazza, 2013). Hub proponents in particular hold that this shift from embodied to disembodied representation occurs within one or more convergence zones (Binder, Desai, Graves, & Conant, 2009; Damasio & Damasio, 1994). Numerous cognitive functions have been ascribed to hubs, including crossmodal integration, pattern association, cognitive abstraction, computations of similarity relations, and symbol formation. An amodal semantic system is capable of accommodating many aspects of cognitive abstraction (e.g., category induction, generalization to new exemplars), and the hub assumption also fits well with the ubiquitous semantic impairments that emerge in the context of temporal lobe atrophy in semantic dementia (Caine, Breen, & Patterson, 2009; Lambon Ralph, Mcclelland, Patterson, Galton, & Hodges, 2001; Lambon Ralph & Patterson, 2008; Rogers et al. 2006).

Despite the clear explanatory power of the hub approach, this perspective has its own unique set of shortcomings. Foremost, the neurobiological mechanisms by which hubs perform propositional transformations remain essentially a black box (Kandel, 2006). We must currently take it on faith that the language of thought involves a form of mental calculus that operates over abstract symbols: we have only the most rudimentary understanding of how the brain extracts and manipulates symbols (Deacon, 1998; Louwerse, 2011). Deacon (1998) argued that the co-evolution of language and brain (particularly the prefrontal cortex) has uniquely equipped Homo sapiens for symbolic cognition. However, the mechanism by which symbols are assigned and the neural representation of the symbols themselves remain far less specified than the neural dynamics of hierarchical processing within the early visual and auditory systems.

Another common objection to amodal hub theories arises from the symbol grounding problem (Harnad, 1990). Embodied cognitive systems ground the meanings of words and objects through direct mapping to physical objects, introspective states, and event schemas. In contrast, a disembodied semantic system is comprised of symbols and propositions, all of which are ultimately abstracted away from physical referents. For a firsthand example of the grounding problem, consider a recent dialogue between the first author (who has never been to Australia) and an Australian family friend. Q: What’s Sydney like? A: It’s a lot like Melbourne. The circularity of defining an unknown (SYDNEY) via another unknown (MELBOURNE) is the crux of the grounding problem (for a related anecdote see Shapiro, 2008). The Sydney-Melbourne conundrum is amplified within large-scale amodal semantic approaches such as latent semantic analysis (LSA) where the meanings of words (amodal symbols) are derived exclusively through implicit associations and co-occurrence statistics with other symbols (Landauer & Dumais, 1997)—a situation compared to learning a foreign language by studying a dictionary written in that language (Searle, 1980).

The trajectory of normal language acquisition offers a clear solution to the grounding problem faced by LSA and other amodal models. Zwaan (2008) notes that there are numerous modes of extracting meaning from associations and co-occurrence data in our environment. An attentive and curious infant learns co-occurrence relationships about visual stimuli, sounds, and emotional experiences in their immediate environment (e.g., teddy bears, blankets, and pacifiers are pleasant things that occur in my crib). Simultaneously, the same pre-linguistic infant is bombarded with explicit labels for these objects. This early stage of language acquisition is heavily reliant upon referential learning (Golinkoff, Mervis, & Hirsh-Pasek, 1994), wherein infants link arbitrary phonological symbols to the immediate objects in their environment, often through a combination of explicit instruction and exaggerated demonstration (Juhasz, 2005; Reilly, Chrysikou, & Ramey, 2007). Thus, our earliest learned words are often acquired through language-referent pairings that provide a perceptual grounding mechanism for more complex, later-learned modes of language and conceptual acquisition.

LSA is a model of semantic space based on extracting concepts through relationships between words. LSA is, however, agnostic to earlier forms of language-referent learning that might ground a core lexicon in perception and action. One appealing hypothesis is that the earliest learned words constitute a set of concrete primitives (e.g., SAD) from which we later expand to learn abstract concepts (e.g., MELANCHOLY) (Barsalou, 2008; Crutch & Warrington, 2005; see also the symbol interdependency hypothesis of Louwerse, 2011).

Online reconstruction of semantic representations

Reconstruction, filtering, and post-interpretive processing are well-accepted phenomena in episodic memory research. One compelling source of evidence for similar reconstructive processes in semantic memory involves variability in patterns of cortical activation when the same object concept is accessed through different modalities and task cues (Kiefer & Martens, 2010; Willems & Casasanto, 2011). For example, Van Dam and colleagues (2012) used a go/no-go paradigm where participants made judgments of objects naturally imbued with action and color salience (e.g., a tennis ball). Participants responded to either visual attributes of a word (e.g., “Is this object a green color?”) or an action property for the same word (e.g., “Is this word associated with a foot action?”). Probes of action properties selectively engaged motor cortex, whereas color probes did not activate the same regions. Similar contextual variability is also apparent in patterns of cortical functional connectivity. Using the same go/no-go paradigm, Van Dam and colleagues (2012) reported that probes of action properties strengthened connectivity between a putative hub region (posterior superior temporal sulcus) and motor cortex. That is, probes for action properties (e.g., “Is this word associated with a foot action?”) resulted in stronger functional coupling between superior temporal sulcus and motor cortex than probes for color properties (see also Hoenig, Sim, Bochev, Herrnberger, & Kiefer, 2008).

The role of flexible semantic reconstruction is also supported through studies of polysemy and metaphor. Hauk and colleagues (2004) previously demonstrated engagement of somatotopic regions of motor cortex corresponding to words with high motor effector salience (e.g., kick, pick, lick) in a lexical decision task (though see Postle et al. 2008). Raposo and colleagues (2009) note that polysemy and metaphor offer significant challenges for the somatoptic representation hypothesis (Louwerse & Jeuniaux, 2008, 2010; Mahon, 2014). That is, a word such as kick assumes a different sense in the context of phrases such as kick the football vs. kick the bucket. In their functional magnetic resonance imaging (fMRI) work, Raposo and colleagues (2009) demonstrated that the critical verb, kick, activates motor cortex only under congruent sentential contexts, a finding that challenges the notion that semantic representations are fixed. One unique possibility regarding ultra-rapid engagement of the motor complex for kick, pick, lick verbs is that these words reflect a small subset of the lexicon that enjoys privileged access to the sensorimotor system un-distilled through hubs. Coslett and colleagues (2002) proposed the relevant hypothesis that knowledge of body parts constitutes a dissociable subdomain within semantic memory. It is possible that this class of effector-specific verbs such as kick and pick engage this putative subdomain. Another possibility is that the earliest learned verbs are more strongly associatively linked to the motor system than later acquired verbs. In contrast, it is difficult to envision how many of the verbs within this manuscript (e.g., premised, engaged, modified, facilitate) could evoke a similar pattern of somatotopic engagement.

Representational pluralism: hybrid, multilevel approaches to conceptual knowledge

Dove (2009) argued that the shortcomings of hub and distributed theories necessitate a class of hybrid theories that integrate both embodied and disembodied components (see also Kemmerer, 2015; Zwaan, 2014). There currently exist a range of hybrid semantic models that are well equipped to handle this challenge. These models differ historically in the constraints of how they achieve the goal of representational pluralism, either through a unitary semantic system (i.e., words and percepts converge upon an amodal semantic store) or the coordinated activity of multiple semantic systems (i.e., language and sensorimotor semantics constitute parallel channels). In this section we review several hybrid, multi-level semantic frameworks.

The convergence zone framework

The convergence zone framework is a prominent example of a hybrid approach that relies on reciprocal activity between local cortical hubs interacting with a distributed sensorimotor network (Damasio & Damasio, 1994). Damasio argued that semantic representations within hubs are unrefined and that these underspecified representations are enriched via retroactivation, through which the sensorimotor system is re-engaged through motor enactment and simulation processes (Barsalou, 1999; Pecher, Zeelenberg, & Barsalou, 2004). Upon this view, local hubs are activated both during object perception and during semantic memory retrieval. During the early stages of perception, first-order convergence zones bind time-locked activity in early sensorimotor cortices. Next, second-order convergence zones combine activity yoked from first-order convergence zones. This pattern of hierarchical conjunctive processing continues until all relevant perceptual information is bound into a coherent representation. A key feature of this theory is that convergence zones do not contain the integrated representation itself. Instead, these brain regions act as pointers or pattern associators to activation patterns within lower order cortical cell assemblies. Damasio (1989) argued that such retroactivation processes are integral for enriching “unrefined” representations. Simmons and Barsalou (2003) and Barsalou and colleagues (2003) extended this idea, arguing that the degree and specificity of enrichment processes are moderated by contextual demands. During semantic retrieval, the process reverses: top-down information guides activation of higher order convergence zones, which guide activation of lower-order convergence zones, which in turn coordinate time-locked activation of early sensorimotor cortices (Meyer & Damasio, 2009).

Damasio (1989) initially proposed that the neuroanatomical localization of convergence zones is mediated both by the modality of information being processed and its position within the hierarchy (see also Sporns, Honey, & Ko 2007 for related distinctions between provincial vs. connector hubs). Recent work within the constraints of the convergence zone theory has utilized multivariate pattern analysis (MVPA) during fMRI of semantic processing to localize potential binding sites, most notably within the posterior superior temporal cortex (Mann, Kaplan, Damasio, & Meyer, 2012). The convergence zone principle has been invoked to explain numerous cognitive and linguistic phenomena including proper noun deficits, mirror processing impairments, “grandmother neurons”, and contextual integration effects supporting the retroactivation of introspective mental states that support abstract concepts (Damasio, 1989; Meyer & Damasio, 2009).

The hub and spoke model

Patterson, Lambon Ralph, Rogers, and colleagues modified the original convergence zone framework into today’s dominant hybrid approach known as the Hub and Spoke Model of Semantic Cognition (Binney, Embleton, Jefferies, Parker, & Lambon Ralph, 2010; Lambon Ralph et al. 2010; Lambon Ralph, 2014a; Patterson et al. 2007). The hub and spoke model proposes dynamic interactivity between a series of modality-specific spokes linked to hubs that are situated bilaterally in the anterior temporal lobes (ATLs). Under this approach, hubs perform amodal transformations that facilitate cognitive abstraction by computing similarity relations between objects (Rogers et al. 2004). The hub and spoke model has vast explanatory power for abstract concepts and effects of graceful degradation incurred in dementia. Yet, much remains to be learned about the cognitive and neural mechanisms underlying this model architecture. In particular, the contribution of sensorimotor simulation in the online reconstruction of object concepts remains underspecified. Other unresolved issues regard whether language acts as an ancillary verbal spoke and more generally how language is integrated within the model (see the “words” node in the model of Patterson et al. 2007).

When considering how hub and spoke models answer the call for pluralism, one point worth noting is that there may be a discrepancy between the structural and functional architecture of such models. That is, although the existing computational implementations of the hub units are architecturally amodal (e.g., Rogers et al. 2004), learning-induced attractor states in the trained model are likely to include hub units, some of which are functionally amodal but some of which are tuned to specific modalities (see also Crutch & Warrington, 2011). Recent studies of temporal lobe connectivity support the notion of progressive, hierarchical convergence of modality-specific information (e.g., auditory + visual detail) across the temporal cortices. For example, disparate features A, B, C, D gradually cohere into AB and CD, ultimately forming a coherent object unit, ABCD. The precise anatomy of this convergence process and whether it is graded or discrete remains debated. Hub and spoke proponents have most recently placed the endpoint of this feature binding process and the subsequent computational operations within the anterior fusiform gyrus (Binney, Parker, & Lambon Ralph, 2012; but see Tyler et al. 2004).

The dynamic multilevel reactivation framework

We recently proposed a complementary, more explicitly multi-level semantic architecture that specifies the nature of hub-spoke interactivity, an approach we term the Dynamic Multilevel Reactivation Framework (Reilly & Peelle, 2008; Reilly et al. 2014). Our model hypothesizes that semantic memory is subserved by a series of hubs that re-engage sensorimotor spokes during online reconstruction of object concepts. Figure 1 illustrates a simple schematic of how the hub and spoke systems interact. The hub system is composed of both low- and high-order hubs. Low-order hubs (e.g., angular gyrus, posterior middle temporal gyrus) have high node centrality and massive reciprocal connectivity with sensorimotor regions. As such, low order hubs are especially suited for heteromodal feature binding (Bonner, Peelle, Cook, & Grossman, 2013). This hypothesis is in line with proposals that regions of the angular gyrus play a critical role in establishing combinatorial semantic relationships between congruent concepts (e.g., red apple vs fast blueberry) (Bonner et al. 2013; Graves, Binder, & Seidenberg, 2013; Price, Bonner, Peelle, & Grossman, 2015). Recent structural connectivity studies using tractography have also demonstrated powerful coupling between these putative low order (angular gyrus) and high order (temporal pole) hubs during both verbal and non-verbal tasks, such as reading a sentence describing an event and viewing a picture of the same event (Jouen et al. 2014).

Fig. 1
figure 1

A multi-level, multi-hub semantic model. Hypothetical schematic of three, quasi-modular sensorimotor spoke systems (e.g., vision, audition, motor) bounded by dotted lines. Provincial hubs (within each module) feed a series of low-order connector hubs (e.g., angular gyrus, posterior middle temporal gyrus). These low-order hubs facilitate heteromodal feature convergence through binding, pattern recognition, and pattern completion. This coarsely bound information then streams to high-order hubs in the anterior temporal lobes that conduct nonlinear, symbolic transformations

Activity within low-order hubs can be characterized as heteromodal in that sensory features are bound within these regions (for a discussion of “first order” sensorimotor integration processes within the angular gyrus see Seghier, 2013). We hypothesize that high-order hubs situated primarily within the anterolateral temporal lobes conduct symbolic transformations upon these bound representations. During this transformation process, conceptual knowledge is abstracted from its sensorimotor roots via a series of successive processing stages whereby perceptual and linguistic knowledge ultimately converge (sensory ⇒ heteromodal ⇒ amodal). Under this view, amodal representations are unrefined and require enrichment through sensorimotor simulations. Impoverished stimulus conditions (e.g., non-canonical situations, atypical exemplars, fragmentary input) and complex task demands drive such enactment processes that are carried out through the spoke system.Footnote 2 This view emphasizes the dynamic nature of concepts and the fact that the degree of sensorimotor reactivation required for a particular concept depends on the unique demands of the task at hand.

Our view is that hubs form the core of the semantic system, whereas sensorimotor spokes act as a supporting halo. Task demands and depth of processing modulate interactivity between these two components, and this interactivity is mediated by a cognitive control system (see also Corbett, Jefferies, & Lambon Ralph, 2011; Jefferies, Patterson, & Lambon Ralph, 2008). Support for this perspective includes a recent voxel-based lesion symptom mapping study, correlating stroke-related left hemisphere cortical damage in aphasia with selective deficits in generating the names of manipulable objects (Reilly et al. 2014). In this work, we examined patients with extensive left inferior frontal lobe damage impacting Broca’s area and adjacent regions of the motor complex (ventral premotor and motor cortex). A strongly embodied view predicts that damage to regions of the motor cortex that mediate skilled motor movements of the dominant (right) hand would compromise both the ability to execute actions and also the ability to covertly simulate their corresponding motor plans.

We examined patient performance and lesion correlates for generating exemplars of manipulable categories (e.g., “name a hand tool”) relative to non-manipulable categories (e.g., “name a mountain range”). Lesion mapping revealed no correlation between integrity of the motor cortex and performance on generating manipulable exemplars—a trend that is consistent with prior studies of tool naming among patients with profound limb praxis impairment (e.g., apraxia) (Negri et al. 2007; Rosci, Chiesa, Laiacona, & Capitani, 2003). Among the patients we investigated, integrity of the angular gyrus (a hub) and MT/V5+ (a visual spoke projection implicated in motion perception) predicted impairment.

Additional evidence for the Dynamic Multilevel Reactivation Framework comes from a recent fMRI study among healthy young adults (n = 18) (J.R., A.C., & R.J. Binney, in preparation). In this study, participants learned a series of novel tools and animals via animated videos where the target item moved in an eccentric path and manner and made animal-like or tool-like noises while a narrator announced their names. We trained participants to 100 % naming accuracy and 1 week later scanned participants while they named both the novel objects and a set of familiar tools and animals. The critical experimental manipulation was that participants named each item from exposure to only one of its constituent modality specific features (e.g., visual form or environmental sound) during three separate modality-blocked runs. We reasoned that hub organization would be supported if a conjunction analysis revealed a common core intrinsic to all modalities. In contrast, a fully distributed approach would be supported by a lack of overlapping regions and if one feature in isolation (e.g., sound) activates a distributed representation encompassing the other features (i.e., visual form). We found support for the multiple hub-based perspective as illustrated in Fig. 1. A conjunction analysis [(Audfamiliar – Audnovel) ∩ (Visualfamiliar – Visualnovel) demonstrated that naming a familiar item from its visual form and naming the same items from their associated sound (e.g., a dog barking) engaged a common network of both high-order hubs (anterior temporal lobe) and low-order hubs (posterior middle temporal gyrus) (Fig. 2).

Fig. 2
figure 2

A functional magnetic resonance imaging (fMRI) conjunction analysis of naming from sound and visual form. The renderings above reflect a conjunction analysis conducted in SPM8 (Wellcome Trust Centre for Neuroimaging) as 18 participants covertly named a series of familiar relative to novel concepts. The conjunctions above represent “common” activation when naming objects from only their sound or visual form [(AUDfamiliar – AUDnovel) ∪ (VISfamiliar – VISnovel)]

Using PET and a different cognitive subtraction method [Toolsvisual + Animalsvisual ∩ Toolsauditory + Toolsvisual], Tranel and colleagues (2005) identified a modality neutral region of the inferior temporal lobe that was commonly activated when naming from the sounds and visual forms produced both by animals and tools (relative to scrambled sound and visual baselines). Tranel and colleagues were specifically interested in the role of this brain region in lexical retrieval, serving as an intermediary link between conceptual processing within the ATLs and post-lexical form encoding processes. In our analyses, we found a different distribution of more superior and anterior temporal lobe activity. This discrepancy is most likely due to the conjunction method we employed [i.e., familiar – novel], which effectively subtracted off the effects of lexical retrieval and subsequent post-lexical processes, focusing instead on areas commonly activated for the semantic features of familiar concepts. These differences highlight the inherent complexities involved in parsing the variance of semantic structure from a multifactorial linguistic task such as naming.

The challenge of abstract words

The empirical base for most theories of conceptual knowledge is based largely upon experimentation with concrete concepts. The question of how abstract concepts are represented in the brain presents particular challenges to a number of these accounts. Investigations of abstract concept knowledge, and the representational differences between abstract and concrete concepts, have approached the topic from a variety of perspectives. Some accounts focus on discrepancies in the amount of information available for concrete words relative to abstract words, including having more semantic features (Plaut & Shallice, 1993), superior ease of predication (Jones, 1985), and more facile access to contextual information (Schwanenflugel & Shoben, 1983). Other accounts focus on qualitative differences such as the claim that abstract words are more dependent upon associative than perceptual or similarity-based information, whereas concrete concepts show the reverse tendency, an approach framed within the qualitatively different representations (QDR) hypothesis (Crutch & Warrington, 2005). A further category of studies has addressed similarities and differences in the neural substrates of abstract and concrete concepts, such as patient studies (Bonner et al. 2009; Loiselle, Rouleau, Nguyen, & Dubeau, 2012), fMRI (Binder et al. 2005; Wang, Conder, Blitzer, & Shinkareva, 2010), electrophysiological investigations (Barber, Otten, Kousta, & Vigliocco, 2013), and transcranial magnetic stimulation (Pobric, Lambon Ralph, & Jefferies, 2009).

Some studies have combined multiple perspectives. For example, on a synonym judgment task in which the quantity of relevant contextual information was varied, Hoffman et al. (2014) found greater activation of anterior temporal lobes in the presence of relevant information (consistent with a role in representing conceptual knowledge) and inferior prefrontal cortex in the presence of irrelevant information (where appropriate aspects of meaning have to be selected, consistent with a semantic control function). Similarly, dual coding theory (Paivio, 2014) can be regarded as combining quantitative perspectives (greater representational strength for concrete items) and qualitative perspectives (verbal and visual information). Several other recent pluralistic models of abstract-concrete concept representation in the tradition of the dual coding theory have also been recently proposed, including the words as social tools (WAT) hypothesis (Borghi, Scorolli, Caligiore, Baldassarre, & Tummolini, 2013) and the language as situated simulation (LASS) model (Barsalou, Santos, Simmons, & Wilson, 2008). Dove (2014) is an especially strong proponent of the perspective that language acts as an embodied mode of thought, yielding a parallel and augmentative workspace for sensorimotor conceptual processing. Perhaps the closest theory to date to an account incorporating quantitative, qualitative and neural perspectives is Shallice and Cooper’s (2013) hypothesis that abstract concepts rely on modal logic for abstracting over events, applying modal operators recursively, or representing hypothetical events. Shallice and Cooper propose that these processes give rise to semantic associations between abstract concepts and depend critically upon the left lateral inferior frontal cortices.

One critical step toward elucidating abstract concepts is to develop a positive operational definition for the construct of abstractness. This necessarily involves looking beyond the sensorimotor channels traditionally implicated in the acquisition and representation of concrete concepts and considering a host of additional brain systems that may influence the formation of conceptual knowledge (Crutch, Troche, Reilly, & Ridgway, 2013; Troche, Crutch, & Reilly, 2014). For example, consider the role of magnitude information in concepts such as AMOUNT and LENGTH, the role of time in concepts such as MOMENT or HISTORY (Crutch et al. 2013), and the importance of emotion information in the representation of abstract terms more generally (Gallese & Lakoff, 2005; Kousta et al. 2011; Vigliocco et al. 2014; Vigliocco, Vinson, Lewis, & Garrett, 2004; Westbury et al. 2013).

Many previous empirical studies of word concreteness have isolated the tails of the concreteness spectrum, examining performance discrepancies for highly concrete words (e.g., beach) relative to highly abstract words (e.g., preponderance) (Binder, Westbury, & McKiernan, 2005; Crutch, Ridha, & Warrington, 2006; Pexman, Hargreaves, Edwards, Henry, & Goodyear, 2007; Reilly & Kean, 2007). Based on the ubiquity of this approach, one might logically conclude that concreteness is a fixed categorical distinction and that all concepts lend themselves to the binary distinction of abstract or concrete; however, this is not the case. Many words resist dichotomous categorization as either concrete or abstract. Our position is that the graded nature of concreteness thwarts multiple semantics approaches that require discrete processing mechanisms for abstract and concrete concepts. A more plausible and parsimonious alternative involves modeling the meanings of all words irrespective of their concreteness within a single high-dimensional semantic space. We hypothesize that numerous cognitive dimensions bound this space, including color, odor, motion, sound, emotion, social interaction, morality, time, space, quantity, polarity (i.e., positive/negative feelings), and valence. A key component of our approach is that every word has measureable salience within each of these domains and that all of the domains considered together constitute a topographic space where word meanings are distributed. In recent work, we have termed this the abstract conceptual feature (ACF) approach (Crutch et al. 2013).

We recently subjected ratings for hundreds of individual abstract and concrete English nouns to a hierarchical cluster analysis using the ACF approach (Troche et al. 2014). Our first step was to pursue dimensionality reduction upon the original set of 12 cognitive domains (e.g., valence, arousal, ease-of-teaching, sensation, etc.). The factor analysis revealed three latent variables corresponding roughly to sensation, emotion, and magnitude. These variables define a three-dimensional space upon which any word’s meaning might be plotted using Euclidean distance measurements. Figure 3 shows the distribution of a larger set of 750 English nouns spanning the concreteness spectrum within a high dimensional semantic space characterized by 14 cognitive dimensions.

Fig. 3
figure 3

A high-dimensional topography for word meaning. a Heatmap depicting Likert-scale ratings gleaned from 328 participants. Each of the 750 horizontal rows reflects one English noun, ordered on the y-axis from the most abstract to the most concrete using the Medical Research Council (MRC) Psycholinguistic database norms (Coltheart, 1981). The x-axis reflects 14 discrete cognitive dimensions aggregated by their relatedness via factor analysis (Troche et al. 2014). b Correlogram reflecting relations between the numerous predictors for the same 750 words depicted in a

We tested the validity of these distance metrics as markers of semantic relatedness in a number of ways. In one study, we demonstrated recently that ACF distance metrics outperformed latent semantic analysis distance metrics analysis in predicting comprehension performance (accuracy) of a patient with global aphasia on a series of spoken word to written word matching tests of verbal comprehension (Crutch et al. 2013). The higher error rate observed when identifying targets presented within word pairs with low ACF distances (semantically related) as compared with high ACF distances (semantically unrelated) indicates that the high-dimensional space generated from ACF control ratings approximates the organization of abstract conceptual space. ACF ratings of polarity (positivity/negativity) have also been used to explain superior comprehension of antonyms relative to synonyms or other non-antonymous associates in three further global aphasic patients (Crutch et al. 2012), suggesting that polarity is a critical semantic attribute of abstract words (see also Westbury et al. 2013).

One clear advantage of the ACF approach and related high-dimensional approaches (Moffat, Siakaluk, Sidhu, & Pexman, 2015; Westbury et al. 2013; Zdrazilova & Pexman, 2013) is that their models dispense with the artificial dichotomy of abstract vs concrete. That is, meanings of all words (abstract and concrete) can be modeled within a single semantic space. The ACF approach does not imply that abstract words constitute merely a list of features, or that modal logic machinery (Shallice and Cooper, 2013) or semantic control processes (Hoffman et al. 2014) are unnecessary. Rather, the assertion is that at least some of the information on which such processes operate share parallels with compositional, feature-based approaches to concrete concepts. For example, the meaning of an abstract concept such as TRUST can potentially be decomposed into a high-dimensional space factoring a range of variables (e.g., arousal, perceptual salience, emotion) analogous to the method of decomposing concrete concepts into a perceptual feature space.

The high-dimensional topography approach to concept representation fits well within the Dynamic Multilevel Reactivation Framework, which predicts that many sources of modality-specific information about concepts converge and are then bound into a single, coherent representation. In turn, this coarsely bound representation is subjected to symbolic transformation. The numerous cognitive dimensions that bound the ACF approach act as the putative spokes within this framework. One feature of this approach that distinguishes it from many other models (e.g., Dual Coding Theory) is that it is a unitary semantics model (for alternate unitary approaches see also Andrews, Vigliocco, & Vinson, 2009; Caramazza, Hillis, Rapp, & Romani, 1990; Vigliocco et al. 2004). That is, the perceptual and linguistic systems ultimately converge upon a single semantic store.

Concluding remarks

Biological plausibility and theoretical necessity impose essential constraints on models of semantic representation. Amodal semantic models continue to feature prominently in the study of concept representation despite significant limitations in our understanding of the neural mechanisms that underlie symbolic transformations (for a mechanistic discussion of symbolic implementation within neural networks see Knoblauch, 2008). Embodied cognition in its pure form dispenses with symbols altogether by linking semantic memory directly to somatic states and perception. Thus, one might argue from a symbol standpoint that embodied cognition currently holds on anatomical plausibility advantage. Yet, fully distributed sensorimotor representations can only take us so far: challenges posed by abstract concepts, linear semantic feature decomposition, and patient-based dissociations (e.g., semantic dementia) call for something more.

We have described the distinction between embodied vs. disembodied cognition as closely aligned with the anatomical principle of distributed vs. hub organization. An anonymous reviewer raised the question of whether this characterization is entirely justified, and whether it is possible to implement a distributed architecture for amodal hubs. Indeed, the Dynamic Multilevel Reactivation Framework reflects such architecture premised upon the coordination of multiple distributed hubs. Sporns (2012) and Sporns and colleagues (2007) have argued that there are several distinct variants of hubs (e.g., provincial vs connector) and that the hub-spoke architecture is replicated at numerous levels within the cortical processing hierarchy.

Perhaps the most compelling advantage of multilevel models, including the Dynamic Multilevel Reactivation Framework, is their capacity to incorporate both embodied and disembodied perspectives. Within this approach, hubs assume a starring role, flanked by a supporting cast composed of spokes conveying not only sensorimotor and emotional information but also contributions from a host of other dimensions. We have also described a potential grounding solution whereby the meanings of abstract and concrete words cluster within a unitary, high-dimensional space. As with any incipient theory, the hard empirical support for both approaches awaits.