1 Introduction

Although enactive approaches to cognition vary in terms of their character and scope, all endorse several core claims. The first is that cognition is tied to action. Thinking, feeling, and perceiving are the “enactment of a world and a mind on the basis of a history of the variety of actions that a being in the world performs” (Varela et al. 1991, p. 9). The second is that cognition is composed of more than just in-the-head processes. Because cognitive activities are irreducibly embodied and situated, they are made up of processes looping through brain, body, and world—and thus (at least partially) externalized via features of our embodiment and in our ecological dealings with the people and things around us.

I here appeal to these two enactive claims, along with similar ideas in phenomenology, to consider a view called “direct social perception” (DSP). DSP is the idea that we can sometimes perceive features of other minds directly in the character of their embodiment and environmental interactions. However, my goal here is not to develop a detailed defense of DSP. I have done so elsewhere (Krueger 2012, 2018a; Krueger and Overgaard 2012). For this discussion, I will assume that DSP is plausible and probably even true. Instead, I will consider some consequences of DSP. One consequence, I argue, is that if DSP is true, we can probably also perceive features of mental disorders as well. Some of these features are embodied in particular sorts of ways and thus fall within the scope of visual content. I draw upon the developmental psychologist Daniel Stern’s notion of “forms of vitality”—largely overlooked in these debates—to develop this idea. And I argue further that this is more than just a philosophical issue. An enactive-inspired defense of DSP can, in the context of mental disorder, help clarify some ways we play a regulative role in shaping the temporal and phenomenal character of the disorder in question. It may therefore have practical significance for both the clinical and therapeutic encounter.

2 DSP and enactive externalism

Recently, philosophers of various stripes have defended DSP by arguing that we can directly perceive others’ mental states when we perceive their expressive and goal-directed behavior.Footnote 1 According to DSP, when I see my niece smile, I see part of her happiness; similarly, when she scurries off to retrieve her favorite puzzle, I see her intention to play with me embodied in her ongoing behavior. For defenders of DSP, mental states aren’t hidden away behind (and causally antecedent to) behavior. They are concretely embodied in behavior. Seeing this behavior is thus to see mental states directly, without inferential mediation.

DSP challenges the widely-held assumption that mental states are unobservable, perceptually inaccessible to everyone but their owner. Since we can’t perceive mental states, the thinking goes, we must instead use an indirect (i.e., non-perceptual) method to reach them, one based on inference or simulation. This presumed unobservability of mental states has even led some to argue that the question of whether a given entity (e.g., machine, animal, vegetative patient) is minded is, in principle, unanswerable (Gray and Schein 2012).

Defenders of DSP often draw upon phenomenological and enactive approaches since both reject the supposition that cognition is entirely head-bound.Footnote 2 Neither phenomenology nor enactivism disputes that brains are necessary for cognition, but they do question their sufficiency. For both, brains are part of larger systems—including bodies interacting with their environment—and cognitive processes are realized within the integrated brain-body-environment dynamics comprising these systems. As a result, many cognitive processes have external world-facing parts that can be seen by others. These parts are found within the character of our embodiment and our ongoing interactions with the people and things around us.Footnote 3

Since many emotions have characteristic facial, bodily, and behavioral signatures in a way other mental states like beliefs and desires may not, they are a focus of arguments supporting DSP. For example, phenomenologists like Scheler (1954) and Merleau-Ponty (2012) argue that emotions are perceptually available in expressive dynamics—psycho-physical “integral wholes” (Scheler) or “variations of belonging to the world” (Merleau-Ponty)—comprised not only of our embodiment and agency but also the objects and institutions around us (Krueger forthcoming a). More recently, a number of philosophers have developed enactive views of affectivity according to which moods and emotions are composed of physical processes—perceptual, autonomic, somatovisceral, chemical and hormonal, motoric, etc.—spanning brain, body, and (sometimes) world (Colombetti 2017; Colombetti and Roberts 2015; Hufendiek 2015; Krueger and Szanto 2016; Maiese 2011; Stephan et al. 2014).Footnote 4

One strategy for defending DSP is to supplement phenomenological and enactive frameworks with empirical work. Different studies appear to suggest that some bodily expressions are external parts of affective states like emotions. Some of this evidence comes from deficit studies. For example, individuals who experience diminished facial expression, whether congenital or acquired, often report diminished emotional phenomenology (Krueger and Michael 2012). Without the ability to spontaneously express their emotions via the relevant sensorimotor circuits like gestures and facial expressions—and activate efferent and afferent feedback mechanisms within these circuits—part of the emotion (i.e., its external expressive profile, critical for self-regulation) is missing and its experiential character diminished.

Defenders of DSP thus argue that the ontology of some mental states like emotions may include external parts; the latter is part of the physical vehicle needed to realize the former. And when we see this vehicle, such as when we see a smile, frown, or characteristic pattern of behavior, we see more than mere colors, shapes, surfaces, or motion—we see mind directly, in action.Footnote 5

3 DSP and mental disorders

Despite much recent interest in these debates, DSP has not found a prominent place within current discussions of mental disorders. But there are a few exceptions: Fuchs (2018), Gallagher and Varga (2015b), Gangopadhyay and Schilbach (2012), and Ratcliffe (2012), for example, endorse views broadly compatible with what I say below. Moreover, Schilbach and colleagues (Schilbach 2016; Schilbach et al. 2013; Bolis et al. 2017; Bolis and Schilbach 2018) argue that a second-person approach is needed to address the full complexity of mental disorders. From this second-person perspective, mental disorders are construed as more than brain disorders; they constitutively involve the disturbance or “mismatch” (Bolis et al. 2017) of multilevel mechanisms (i.e., biological, cognitive-behavioral, and sociocultural) spanning the integrated dynamics of brain, body, and sociocultural-environmental interactions.

The view I develop below is consistent with these approaches and can supplement them. However, my focus is more restricted in scope. Whereas Schilbach et al. (2013), for example, focus on shared interactional dynamics spanning individuals and the people they interact with, I focus primarily on the ontology of expressive actions at the heart of these everyday interactions. I argue that if DSP is on the right track, it seems plausible that disordered mental states are likewise perceptually accessible insofar as the individual’s expressions and behavior are part of the ontology of these disordered states. These expressive actions are a central part of the phenomenology of our social engagements; we see and evaluate them in others, and we respond in kind with our own expressive actions. These actions are thus the engine of intersubjectivity—but they are, nevertheless, one aspect of more encompassing processes, unfolding at multiple levels and timescales, that enable social interaction.

Of course, a critic could object at the outset that mental disorders stem exclusively from functional brain disorders. If so, observable behavior is not part of the ontology of these disorders. Insel and Quirion (2005), for example, begin their general defense of psychiatry as a clinical neuroscience discipline by insisting that “mental disorders be understood and treated as brain disorders” (p. 2221). But there are reasons to be skeptical about this unquestioned neuroreductionism. First, if—as increasing evidence from embodied and enactive cognitive science suggests—many cognitive processes ineliminably involve extra-neural loops spanning brain, body, and world, it’s unclear why this externalism doesn’t also apply to mental disorders. A separate argument is needed to justify this “neural sufficiency assumption” (Sprevak 2011) in the realm of mental disorders but not elsewhere. Second, despite ongoing brain-centered work within biomedical approaches, little progress has been made isolating neural signatures of, for example, schizophrenia, severe depression, or autism (Fuchs 2018).

Another worry might be that the problem of other minds is too abstract a philosophical issue to impact how we deal with mental disorders in everyday life. In what follows, I challenge this assumption. Mental disorders are sometimes thought to pose significant phenomenological and epistemic difficulties for those “on the outside” in that their internal character departs too widely from the structure and norms of everyday experience to allow for external comprehension. Jaspers, for example, famously said of schizophrenic experience that, “[w]e find changes of the most general kind for which we have no empathy but which in some way we try to make comprehensible from an external point of view” (Jaspers 1963, p. 577). Clearly, we cannot live through another’s disordered experience from their first-person perspective. Nevertheless, as we will see, DSP can remind us that we enjoy more immediate access to features of their experience than is sometimes thought. And this recognition can lead, in turn, to important insights concerning the ways we play a regulative role in shaping the character of that experience.

4 DSP and “forms of vitality”

As we’ve seen, defenders of DSP often support their view by arguing that some mental states are partially composed of external components like bodily expressions and behavior. Since the latter fall within the scope of visual content, there is (or so the argument goes) no special problem about accessing the former.

While a promising strategy for overcoming the problem of other minds, these debates sometimes get mired in metaphysical discussions (e.g., fine-grained mereological issues) that, while philosophically interesting, lose touch with the concrete reality of our mental life. More precisely, there is a danger here of rendering mental activity in excessively static and individualistic terms—such as when focusing on our purported ability to perceive external “tips”, “parts”, or “surfaces” of another’s mental states (Krueger and Overgaard 2012; cf. McNeill 2012). This narrow focus moves away from a core enactivist claim introduced previously: the idea, once more, that thinking, feeling, and perceiving are embodied and situated actions, things that we do. We don’t just perceive tips or parts of mental states like emotions. We perceive emotional expressions: environmentally-responsive actions that unfold processually, in time and space, and which elicit similarly dynamic responses from us (Stout 2010).Footnote 6

To further explore the significance of this idea, I turn now to a concept resonant with enactive principles: the developmental psychologist Daniel Stern’s “forms of vitality” (FV) (1985, 2010). This notion has not yet found a prominent place within these debates. I will argue that it can help safeguard against an overly-static rendering of mentality and enrich our thinking about different properties of action and expression in this context.

4.1 What, why, and how: defining FV

Near the start of his final book, Forms of Vitality, Stern writes:

We naturally experience people in terms of their vitality. We intuitively evaluate their emotions, states of mind, what they are thinking and what they really mean, their authenticity, what they are likely to do next, as well as their health and illness on the basis of the vitality expressed in their almost constant movements. (Stern 2010, pp. 3–4).

As Stern observes, to be alive and embodied is to be in near-constant motion. Movement begins early in human development. Active flexion of the spine is observed in 5–6 weeks of gestation; by 10 weeks of development, some arousal systems are online and begin to support arousal-triggered behavior (Piontelli 2006).

These movements become more pervasive and sophisticated after birth. Respiratory movements rise and fall every three to five seconds. Our bodies are perpetually active, from small movements like fidgeting, twitching, saccades, blinking, finger tapping, or micro-adjustments of posture, muscle tone, and facial expressions, to large movements like dancing, running, and leaping. Our inner life is similarly dynamic: sensations and emotions swell, recede, and dissipate; our attention shifts from one thought to the next; and our level of arousal continually fluctuates in response to things happening around us. Even individuals with Locked In Syndrome, who lack the capacity for overt bodily movements beyond vertical eye movements and blinking, still experience an inner flow of arousal, attention, and experience that, despite their loss of bodily agency, helps preserve their continuous sense of embodied and situated identity (Bauby 1997; Nizzi et al. 2012). The way it feels to be an embodied subject in the world is thus largely constituted by the ongoing integration of this double sense of movement, the dynamic “integration of many internal and external events” (Stern 2010, p. 4).

So what are FV, exactly? For Stern, they essentially involve movement—or better, ongoing patterns of movements. But FV are more than just movements. Lots of things move, from clouds and clock hands to paramecia and people. But not all of these movements are FV. Stern argues that “we cannot fully understand movement without knowing how it is deployed. Knowing the “what” and “why” of a movement is incomplete without knowledge of the “how” (Stern 2010, p. 20).Footnote 7

This passage is important because it helps distinguish FV from other properties—what- and why-properties—of human movements (blinking, twitching) and intentional actions (playing the guitar, reaching for a beer). Although he doesn’t consider this point in detail, what Stern seems to mean is the following: what-properties specify a movement’s type, whereas why-properties specify the causal mechanisms, motivations, and intentions behind it. But how-properties (i.e., FV) are distinct from both of these: they specify the manner or style of an action. How-properties—like what- and why-properties—depend upon both movements and the physical structure of the agent. But how-properties are not reducible to what- or why-properties. This is because the latter properties can remain fixed even while how-properties vary. This variation is found in both individuals and groups.

For example, while learning to play guitar, the “what” (playing the guitar) and “why” (to make music and become a famous rock star) of my actions within numerous practice sessions over many years remains relatively stable, at least at a coarse-grained level of description. But the “how” of these actions will vary—often significantly. After many years of diligent practice, the “how” of my playing reflects my increased skill. My movements are now more confident, controlled, and precise in contrast to when I first picked up a guitar; I may even deliberately (or perhaps involuntarily) incorporate distinct stylistic flourishes into my technique. However, no two guitarists—even those with roughly equivalent skill sets and motivations (i.e., who share the same what- and why-properties)—will play the same way, embody the same “how”. Likewise, the distinctive “how” of a jazz trio will become richer and more complex the longer members play together; this group “how” will develop over time from the gradual integration of the member’s individual styles. Its diachronic development unfolds even as the what- and why-properties of the group’s playing remain fixed. For Stern, these how-properties are part of the ontology “of almost all waking activities” and must therefore be part of our characterization of how it is we see actions in the world (Stern 2010, p. 10).

4.2 Seeing and responding to FV

With this background in place, we can now isolate two features of FV especially relevant to this discussion. First, they have a kinematic signature specified by features of the subject’s distinctive morphology, life-stage, and skill-set. The same subject will realize different FV throughout her lifetime. However, although her FV are constrained by structural features of her embodiment, the shape or contour of her available FV will nevertheless evolve as she develops and acquires new capacities that enable her to adapt, in increasingly sophisticated ways, to changes in her environment.

This latter point highlights a second important feature of FV: they are environmentally situated and thus have a normative character. In other words, the shape or contour of FV are determined not just by features of an individual’s embodiment but also by physical, social, and symbolic features of the context in which they arise. This situatedness is what enables the subject and her FV to fit into—and become regulated by—her environment. FV can be more or less effective, more or less adequate, appropriate, or correct given the demands of a particular situation.

For example, the aggressive style of a head-banging death metal guitarist will be a poor fit with the more subdued FV characteristic of a jazz trio. Expressive properties of the former will not mesh well with the latter and may hinder the group’s overall performance. Likewise, extravagant expressions of humor (a broad open-mouthed smile, loud laughter, big gestures, etc.) are welcome during a night out with friends but disruptive in a professional meeting, funeral, or worship space. In these latter contexts, other individuals may actively dampen or down-regulate inappropriate FV by shushing, speaking and gesturing quietly, or simply failing to reciprocate. Inappropriate FV will thus fail to be smoothly integrated into that environment. The important point is that, for Stern, FV are distinct from the “what” and “why” of an action—stylistic “how” properties of actions that are largely overlooked in DSP discussions of embodiment and expression. But crucially, FV are not simply blind expressions of an individual’s embodiment, their kinematic signature. They are also normatively situated in that they are responsive to, and regulated by, specific details and sociocultural practices of the concrete situation in which they occur.Footnote 8

As we negotiate everyday spaces at work, home, school, and play, our FV flexibly adapt to different normative expectations within these spaces. These adaptive processes unfold because we directly see and respond to others’ FV. There is evidence for both parts of this claim.

To begin with the first part: Runeson and Frykholm (1983) found that viewers of point-light displays could accurately judge the relative weight of a box lifted by an actor simply by observing the actor’s kinematics (see also Runeson 1985). Viewers could also accurately judge the weight actors expected to lift based upon their kinematics, prior to their actual lifting. And they could even tell when actors pretended to lift a heavy box, discerning both the actual weight of the box lifted as well as the weight the actors intended to convey to the viewer. The (imagined) context shaped how the viewers perceived and made sense of the actors’ FV. Similarly, Good (1985) found that viewers could, when watching point-light displays of staged social actions (e.g., asking for a light, chance meeting of old friends), discern whether the activity was intended and not simply a chance encounter. In these cases, viewers made relatively fine-grained perceptual discriminations that enabled them to pick out different FV, even with minimal supplementary information. These studies suggest both that we see FV directly, and that our perceptual sensitivity to expressive features of FV is crucial for understanding their social significance.

But this is only part of the story. We don’t merely see others’ FV. As the second part of the claim notes, we are responsively regulated by them, as they are by ours. When we interact with others, their expressive actions—gestures, facial expressions, postural adjustments, intonation patterns, movements and manipulations of shared space, etc.—directly modulate our expressive responses, and vice versa. The dynamic integration or “coupling” (De Jaegher and Di Paolo 2007) of these actions (including how-properties such as their kinetics, intensity, and timing) has been called the “social glue” that binds us together as embodied social agents (Lakin et al. 2003). Many of the bodily responses that animate this process, like motor mimicry and movement synchrony, are involuntary (Bernieri and Rosenthal 1991; Wiltermuth and Heath 2009). But some are not. Some expressive actions are performed in a deliberate manner or style intended to elicit particular responses from the perceiver. Your warm smile and friendly gesture, for example, elicits similar responses from me and motivates an array of further friendly expressions; conversely, a threatening gesture or aggressive movement compels me to tense up and prepare for my own aggressive response. Normative expectations in these situations, beyond mere features of our morphology, in this way impact the kinematic signatures of our distinctive FV. Simply put, FV are often co-regulated.

In sum, we’ve seen that defenders of DSP often support their view by arguing that some mental states are partially composed of external components like bodily expressions and behavior. Since the latter fall within the scope of visual content, there is, according to proponents, no special problem of access to the former. We see mind directly, in action. For Stern, FV are features (i.e., how-properties) of these actions and therefore can also be seen. The upshot is that we don’t just see others’ actions. We see their actions performed in a particular sort of way, as embodying a particular manner or style. This perceptual access to FV is what allows us to become responsively regulated by others and by the norm-governed environments we negotiate on a day-to-day basis.

5 DSP, FV and mental disorders

With this background in place, it is now time to consider DSP, FV, and mental disorders. In this first section, I use autistic spectrum disorder (ASD) as a case study. I argue that DSP and FV are relevant here in two ways: first, we can directly perceive features of ASD directly. These features are embodied in the style of the individual’s expressive movements (i.e., their FV), as well as in their failure to be responsively regulated by norms governing our everyday interactions—a failure to fit into or be ecologically sensitive to norms specifying “appropriate” (i.e., neurotypical) ways of negotiating the world. Second, DSP can help better understand some of the social difficulties people with ASD encounter. This is because people with ASD exhibit what I will refer to as “style blindness”: they quite literally fail to see certain qualities or patterns of neurotypical FV, which impairs their ability to become responsively regulated by the expressive norms regulating neurotypical interactions. These DSP-motivated insights suggest possibilities for developing flexible, more inclusive FV encompassing neurotypical and non-neurotypical patterns of social engagement.

5.1 FV and ASD

ASD is a disorder spanning a spectrum of social abnormalities. While these abnormalities are wide-ranging and vary with age and individual ability, they tend to cluster around a diagnostic triad of social, communicative, and imaginative difficulties (Frith 2003; Rutter and Schopler 1987). People with autism often exhibit a preference for order, predictability, and routine, and they can become preoccupied with a specific subject or activity. Deficits in affective bonding and emotional behavior are also present (Hill and Frith 2003; Hobson 1993). Individuals struggle with various aspects of social attunement: they often avoid direct gaze, have difficulty perceiving and decoding nonverbal cues found within facial expressions, gestures, and postures, and struggle to connect and develop relationships with peers. In short, most of the behaviors needed to establish and regulate social interactions are impaired (Gallese and Rochat 2018).

For several decades, these social difficulties were thought to arise from a core Theory of Mind deficit that impedes the individual’s ability to attribute mental states to others and to use these attributions to predict and interpret their behavior (Baron-Cohen et al. 1985). But this cognitivist and individualistic approach is no longer the consensus view. One worry is that it overlooks embodied and relational features of the individual’s social impairment—including self-other mapping deficits that arise from a dysfunction of perception–action coupling mechanisms (Rochat et al. 2013), as well as the role that interpersonally-distributed interactive factors play in shaping characteristic dysfunctions (De Jaegher 2013; Gallagher and Varga 2015a; Hobson 2002; Schilbach 2016).

Especially pertinent to DSP is growing sensitivity to the way people with autism use their bodies to move, express emotions, and perceptually respond to the social and material world around them (Doan and Fenton 2013; Eigsti 2013; Leary and Donnellan 2012). Sensorimotor approaches focus on the role that ASD FV play in helping individuals organize sensory information (Donnellan et al. 2012). From a neurotypical perspective, these FV may seem unusual or strange; their timing and kinematic qualities can appear odd or contextually inappropriate. For example, people with ASD may have an unusual gait or posture, and exhibit movements, tics, and habits (e.g., rocking, hand-flapping, spinning, exaggerated gestures, etc.) that are off-putting. But first-person reports suggest that people with autism experience their embodiment from the inside in ways that depart from neurotypical experience, too. The character of these anomalous bodily experiences contributes to their distinctive FV, which in turn leads to difficulties interacting with the social world of neurotypicals (Fuchs 2015). Both dimensions of ASD FV therefore warrant more careful consideration.

We can distinguish these two dimensions of ASD FV considered from an internal (first-person) and external (third-person) perspective, respectively. To begin with the former: first-person reports indicate that people with autism experience difficulties controlling, executing, and combining movements—from fine motor control, grip planning, and anticipatory movements, to more complex action-sequences like gesturing, reaching for a book, dancing, or negotiating a crowded hallway (Eigsti 2013; Leary and Hill 1996; Whyatt and Craig 2013). Sometimes this feeling results not just from objective, measurable coordination difficulties but also from a felt sense of diminished agency or loss of bodily control. This feeling seems connected to the sense that one’s body has a mind of its own, particularly when stressed or overstimulated: “I had an automatic urge to touch my body—rub my thighs or my stomach and chest” (Robledo et al. 2012, p. 6). At other times, however, individuals with ASD report diminished proprioceptive and kinaesthetic awareness of limb position and spatial orientation (Blanche et al. 2012). Difficulty locating one’s body in space can lead to challenges when it comes to smoothly interacting with the environment (Robledo et al. 2012). In order to cope, some individuals seek sustained deep pressure or joint compression to regain a felt sense of bodily integrity (Leary and Donnellan 2012, p. 60). Strategies include lying on the floor under a mattress or sofa cushions, jumping on the floor or bed, wearing multiple layers of clothing, banging fists on hard surfaces, or sitting in a plush recliner, bathtub, or swimming pool in order to have the experience of being touched all over.Footnote 9

These anomalous bodily experiences lead some people with ASD to feel as though their FV do not smoothly integrate with neurotypical patterns of interaction. For example, one individual says that, “I was sitting on the floor and when I got up after looking at a couple of books, my friend said I got up like an animal does”—and further, that although she is aware that her FV differ from those of neurotypicals, she remains unsure of how they differ, exactly (Robledo et al. 2012, p. 6). Another says that she will easily “lose the rhythm” required to perform sequences of action requiring two or more movements, and that “[e]verything has to be thought out” in advance (ibid., p. 6), which gives her movements an excessively stiff and unnatural quality. This felt disconnection both from their own body and from neurotypical FV, along with the negative reactions often perceived in others, can lead to frustration: “I have been endlessly criticized about how different I looked, criticized about all kinds of tiny differences in my behavior…No one ever tried to really understand what it was like to be me…” (ibid., p. 6).

These reports capture the character of ASD FV from the first-person perspective, which includes both anomalous bodily experiences (e.g., diminished sense of agency and proprioception) as well an awareness that their ASD FV are not responsively integrated with patterns of neurotypical FV. From a third person or external (neurotypical) perspective, ASD FV also have an anomalous character. For example, people with ASD may repeatedly shrug, squint, pout, or rock back and forth; repeatedly touch a particular object; turn away when someone tries to engage with them; maintain an unusual or inert posture, or appear “stuck” in indecisive movements for an uncomfortably long period of time; have trouble imitating actions; and require explicit prompts or cues to perform an action (Donnellan et al. 2012; Leary and Donnellan 2012; Robledo et al. 2012). A striking example here is delayed response in conversation. Donnellan and colleagues found that twelve young adolescents with minimal verbal skills, all of whom were labelled developmentally disabled or autistic, were in fact capable of offering competent conversational responses—but only, on average, after fourteen seconds of silence (Leary and Donnellan 2012, p. 57). Most neurotypicals would consider this long a pause awkward and either quickly change the subject or abandon the conversation altogether.

The important point is that, for sensorimotor approaches, ASD FV—considered from both a first- and third-person perspective—are constitutive of autism. ASD is not something separate from these FV, a cognitive impairment or neural dysfunction hidden inside the individual’s brain. Rather, these FV comprise experiential and behavioral dimensions that together make up the ontology of autistic ways of being in the world—autistic ways of moving, perceiving, and emoting (De Jaegher 2013). Indeed, diagnostic criteria for ASD within DSM-5 specifically pick out features of these FV: e.g., deficits in social-emotional reciprocity; abnormalities in eye contact and body language or deficits in understanding and use of gestures; stereotyped or repetitive motor movements; hyper- or hyporeactivity to sensory input, etc. (American Psychiatric Association 2013, pp. 41–42). To see these FV is therefore to features of ASD in action. If DSP is true, we have perceptual access to these features as directly as we do neurotypical states, such as when we see part of another’s happiness embodied in their smile or anger in their frown.

To be clear about the scope of this claim, these characteristic FV need not be considered exhaustive of ASD, just as behavioral expressions of neurotypical emotions need not be considered exhaustive of the ontology of that emotion. It is likely that, as with emotions, ASD FV are what I have elsewhere in discussing DSP referred to as “hybrid” phenomena: distributed processes straddling internal (neural, physiological, phenomenological) and external (behavioral, expressive) parts (Krueger 2012). So, it could be, for example, that Theory of Mind deficits (Baron-Cohen et al. 1985) and/or a dysfunctioning mirror neuron system (Ramachandran and Oberman 2006) are also parts of ASD. But the idea that ASD can be reduced to cognitive deficits or neural dysfunctions faces, as we’ve seen, increased skepticism. It now appears more likely that ASD is a multilevel, multidimensional process whose outcome is driven by the interplay of diverse factors operating at different time-scales (evolutionary, cultural, social, individual-psychological) and levels of description (biological, cognitive-behavioral, phenomenological, sociocultural) (Bolis et al. 2017; Happé et al. 2006; Kendler et al. 2011; Walter 2013). In other words, ASD has a complex ontology that cannot be comprehensively understood by viewing characteristic social impairments merely as disordered function within single brains. Rather, characteristic features depend upon both circumstances and where one is located on the spectrum and may therefore vary by individual in terms of intensity and prominence (Gallagher and Varga 2015a). ASD thus involves a range of cascading disrupted processes, some of which have world-facing parts and dynamics that can be directly seen by others.

5.2 Style blindness and normativity

Earlier, I suggested that DSP and FV are relevant to considerations of ASD in a second way. Not only can these ideas help illuminate ways neurotypicals have direct perceptual access to features of ASD in action. Additionally, DSP can help us better understand the source of some of the social difficulties people with ASD encounter. This is because people with ASD exhibit what I will refer to as “style blindness”: they quite literally fail to see some patterns or qualities of neurotypical FV, which impairs their ability to become responsively regulated by the expressive norms regulating neurotypical interactions. This insight has consequences for how we think about the phenomenology of autistic ways of being in the world, and what sort of strategies we might employ to construct more inclusive FV encompassing ASD and neurotypical patterns of social engagement.

Support for style blindness in ASD comes from a number of sources. Several studies found that children with ASD have difficulties imitating actions performed with different styles (Hobson and Lee 1999; Hobson and Hobson 2008). These children did not differ from typically-developing children when it came to imitating the goal-directed nature of complex intentional actions (i.e., recognizing and imitating what- and why-properties) (cf. Boria et al. 2009; Hamilton et al. 2007). But they did struggle to imitate the style of the action (e.g., how-properties such as gentle or forceful), particularly when these how-properties were inessential for achieving the goal. One possible explanation for this result is the weak propensity of children with ASD to identify with others (Hobson 2002). But this imitative deficit does not on its own demonstrate a deficit in perceiving and understanding FV. In other words, the latter need not be seen as a primary deficit; it could be a secondary impairment parasitic on more fundamental deficits in self-other identification.

However, support for the idea that style blindness may be a primary perceptual deficit in ASD comes from another recent study. Rochat et al. (2013) investigated whether individuals with ASD were able to recognize FV expressed within dyadic interactions. Twenty high-functioning patients with confirmed diagnosis of ASD, along with twenty neurotypical controls, viewed sequences of short video-clips of different types of actions involving two people sitting across from one another at a table. Actions included giving a high five, shaking hands, pointing, caressing the other’s forearm, taking the other’s hand, giving or retrieving a mug, and holding up their hand to signal “stop” to the interlocutor. The same type of action (i.e., with the same what- and why-properties) was performed with different FV (i.e., different how-properties): e.g., a vigorous handshake in one clip, a gentle handshake in the other. Participants viewed combinations of these action clips and were then asked to make judgments about them. While participants with ASD were similar to neurotypicals in recognizing what- and why-properties, they made frequent errors in FV recognition—including showing significant difficulty recognizing similar FV (e.g., gentle) across different actions (e.g., handshake vs. high five). This study advances previous work insofar as it suggests that difficulties perceiving FV in ASD are not limited to an imitative context but are rather a primary deficit, a perceptual inability to extract socially salient information from others’ kinematics (ibid. p. 1922).

The idea that ASD involves a perceptual deficit in FV detection gains force when considered with evidence from other complementary studies. A number of authors highlight atypical processing of low-level, sensory, and perceptual information in autism (Dakin and Frith 2005; Happé 1999; Mottron et al. 2006). Children with ASD have difficulty extracting relevant social information from biological cues (Rutherford et al. 2006) and fail to interpret human activities portrayed in point-light displays (Blake and Shiffrar 2007). Children and adults with ASD struggle processing social information in facial expressions, bodily movements, and gestures (Reed et al. 2007), especially when asked to report emotions in observed facial expression and actions (Ashwin et al. 2006; Atkinson 2009; Hubert et al. 2007; Teunisse and de Gelder 2001). This perceptual deficit appears to be cross-modal. For example, there is evidence that individuals with ASD have difficulty identifying emotions from vocal cues (Philip et al. 2010); some children with ASD exhibit a general lack of responsiveness to human voices (Bruneau et al. 2003; Rogier et al. 2010). Taken together, these studies collectively support the idea that a perceptual impairment in recognizing FV (i.e., style blindness) may be a core constituent of social deficits in ASD—one with cascading effects on their ability to connect and share with others in everyday life (Gallese and Rochat 2018, p. 158).

If people with ASD quite literally cannot see some expressive qualities of neurotypical FV, they cannot respond to them—although as we saw previously, they may nevertheless be aware of their existence via the negative evaluations of others. This insight can help better understand the phenomenology of ASD in a number of ways. Not only can it help neurotypicals better understand what it’s like to perceive the social world from the perspective of someone with ASD. Additionally, it can also help clarify why people with ASD often experience anxiety when negotiating the norm-structured environments of the neurotypical social world. This anxiety may stem from an awareness of perceptually-inaccessible information, embodied in FV, that most neurotypicals take for granted; this information is needed to responsively conform to, and be regulated by, social environments at home, work, or play (McGeer 2001). Crucially, it is awareness of the background presence—but perceptual absence—of these (neurotypical) norms, embodied in (neurotypical) FV, that makes the difference. This is confirmed by the fact that high-functioning autistic people report that, despite anxiety and difficulties interacting with non-autistic people, their interactions with other autistic persons are efficient and pleasurable (Schilbach 2016; see also Komeda et al. 2015). The latter are governed by ASD-friendly norms, expectations, and FV distinct from the former.

Before concluding this section, there are two important points to note. First, it is important to emphasize that ASD FV do not necessarily lack normativity, even though their kinematics may differ from neurotypical FV. Like neurotypical FV, they are responsive to particular features of the context in which they arise, and often play a central role in helping individuals more effectively adjust to and negotiate that environment. For example, consider rhythmic, repetitive movements of the body often referred to as “self-stimulation” or “self-stims” (Leary and Donnellan 2012, p. 51). Self-stims like hand-flapping, finger-snapping, tapping objects, repetitive vocalizations, or rocking back and forth are often seen as socially undesirable, meaningless behaviors. Treatment programs have traditionally tried to suppress or eliminate them (e.g., Azrin et al. 1973). However, from a sensorimotor perspective, self-stims may instead be effective ways of managing incoming sensory flows—for example, when incoming information threatens to be overwhelming (i.e., hypersensitivity), or in cases where the individual requires heightened arousal in order to more effectively access further information (i.e., hyposensitivity). If we accept something like an enactive story of perception (Noë 2004), which sees perceptual consciousness as partially constituted by bodily activities—movements of the eyes and head; focusing and refocusing attention; reaching, grabbing, manipulating, etc.—that support our skillful engagement with the world, it’s unclear why self-stims shouldn’t likewise be counted as proper parts of autistic ways of perceptually inhabiting and engaging with the world. More simply, self-stims may be enactive processes by which individuals with ASD skillfully meet the physical, perceptual, and emotional demands of their situation (Leary and Donnellan 2012, p. 51).

The second point is this: these insights suggest that the disturbance or breakdown leading to social impairment in ASD is, in an important sense, symmetrical. In light of their style blindness, it may well be that children and adults with ASD inhabit a different social-perceptual world than do neurotypicals, one organized by different FV, norms, and interpersonally-salient information (Klin et al. 2003). However, neurotypicals have as much trouble skillfully inhabiting their world as they do ours (McGeer 2009, p. 310). Many of the social difficulties people with ASD face result from neurotypical FV, norms, and expectations that simply aren’t adequate to meet the FV, norms, and expectations distinctive of ASD ways of being in the world. Again, recall that high-functioning autistic people report efficient and pleasurable interactions with other autistic persons, even when they struggle to interact with neurotypicals (Schilbach 2016).

To be clear, it’s not as though people with ASD and neurotypicals inhabit entirely different worlds, of course; there are clearly areas of overlap. Nevertheless, a lesson of this enactive and DSP-inspired approach is that there are phenomenologically substantive differences in how neurotypicals and people with ASD experience and inhabit the world, respectively—differences that must be acknowledged. Working within the constraints of some of these differences, as we’ll now see, has consequences for thinking about intervention and therapeutic strategies.

5.3 Enacting inclusive FV

Since ASD has for several decades been thought to consist in a Theory of Mind deficit, prominent treatment and intervention strategies are generally geared toward helping individuals develop their individual mentalizing capacities, such as their ability to represent themselves and those around them (Begeer 2014).Footnote 10 Based on the previous discussion, however, we can now see at least two shortcomings to such approaches. First, they overlook the role that embodied and interactive features play in shaping characteristic impairments and offer few resources for addressing these features. Second, they presuppose that social difficulties in ASD consist in a failure to conform to normative expectations of neurotypicals, without acknowledging (or offering resources to address) the two-way nature of these impairments.

So how might these enactive insights inform more effective intervention strategies? To take one example, consider ASD and music therapy, which can involve listening, singing, or joint music-making. According to Srinivasan and Bhat (2013), music-based interventions are attractive for individuals with ASD for three reasons. First, they may help address core impairments in joint attention, social reciprocity, and verbal and nonverbal communication, along with comorbidities of atypical perception, motor performance, and behavioral problems. Second, children with ASD may find these interventions particularly pleasurable since they often have enhanced pitch processing abilities and musical memory compared to typically developing children (Heaton 2003). Third, music-based activities can provide a non-intimidating context in which to interact with musical instruments and engage in predictable musically-guided interactions with social partners (Darrow and Armstrong 1999). There is evidence that music-based interventions for children with autism are effective; they can positively impact various forms of development, including communicative, social-emotional, and motor development. For example, music therapies can facilitate verbal and gestural skills in children with ASD; enhance social skills such as eye contact, joint attention, mimicry, and turn-taking; and they may also support the improvement of fine and gross motor skills (Srinivasan and Bhat 2013).

In light of the previous discussion, music therapies are potentially effective, I suggest, because musical activities like listening, singing, and joint music-making provide a regulative context in which children with ASD can—alongside neurotypicals—co-construct alternative musically-guided FV. Musical environments in this sense act as “scaffolding” for regulating attention, affect, and behavior (Krueger forthcoming b). Musically-generated auditory and rhythmic signals can regulate attention and movement in a number of ways: by influencing the timing of motor neuron discharge; decreasing felt muscle fatigue; facilitating automatic movements by providing predictable temporal cues; improving reaction time and response quality through facilitated responsive anticipation; and providing auditory feedback for proprioceptive control mechanisms (Thaut 1988, p. 130).

Instead of expecting children with ASD to responsively conform to neurotypical FV, both instead quite literally meet in the music; musical environments function as a common space for developing shared (i.e., musically scaffolded) FV in which participants are jointly responsive. Music is a particularly powerful resource for enacting shared FV. While musical dynamics may share some expressive features with human FV (Kivy 1989), they exhibit these expressive features in a non-visual format. For children with ASD, this is important. Despite significant language impairments and, as we’ve seen, difficulties seeing kinematic elements of human FV, children with ASD appear to have relatively unimpaired musical skills (Bonnel et al. 2003; Heaton 2003). So, while they may not reliably see or respond to kinematic features of neurotypical FV, they can nevertheless be responsively guided by auditory-motor dynamics found within musical FV. And within this musical context, then, children with ASD and neurotypical participants can jointly enact new and more inclusive forms of shared FV that facilitate richer social connections.

Strategies for enacting more inclusive FVs can have therapeutic significance in other domains of mental health and social functioning. Some of these strategies also involve musically-scaffolded FVs. For example, there is evidence that music can scaffold therapeutic FVs in schizophrenia (Talwar et al. 2006) and depression (Erkkilä et al. 2011), potentially leading to short-term reductions in symptoms and improvements in general functioning.

But other inclusive strategies need not involve music. Michael et al. (2015) tested the efficacy of social skills interventions for teenagers with Moebius Syndrome (MS), a rare form of congenital bilateral facial paralysis. These interventions train individuals with MS to enact alternative, non-face-based FVs (e.g., increased gesture, use of eye contact, body language, spatial configurations, prosody) to compensate for their lack of facial expressivity. It is likely that some of the social difficulties people with MS experience stem from interaction partners without MS feeling uncomfortable or confused by their facial paralysis (ibid., p. 2). In other words, like ASD, MS is a two-way impairment: people with MS fail to conform to normative expectations about how social information like feedback and appraisal should be conveyed—i.e., primarily via the face—and interaction partners without MS are thus unsure how to proceed (Krueger and Michael 2012). However, Michael and colleagues found that compensatory FVs can ameliorate some of these challenges. An increase in non-face-based FVs in people with MS can lead to an increase in similar FVs in partners without MS—and heightened levels of mutual rapport—while decreasing fidgeting and other nervous behavior. In other words, FVs are “contagious”. Individuals with MS can thus use social skills training to enact alternative FVs that, in suitably open and responsive interaction partners, may lead to greater levels of social comfort and feelings of belonging. Future empirical work might investigate which kinematic aspects of FVs (e.g., hand gestures, eye contact, prosody, postural adjustments, etc.) are more readily automated than others, and thus help clarify which aspects are subject to conscious control and which are more resistant (Michael et al. 2015, pp. 9–10).

6 Further implications and conclusion

This discussion has primarily focused on DSP and ASD. The latter, as we’ve seen, is partially constituted by distinctive FV that can be seen by others. Might this be the case for other mental disorders?

Consider DSP and schizophrenia. In the mid-twentieth century, the Dutch psychiatrist H.C. Rümke coined the term “praecox feeling” to describe an intuitive process for early identification of schizophrenia. For Rümke, praecox feeling refers to the subtle unease a skilled psychiatrist may feel when interviewing someone in the prodromal phase, prior to their first psychotic episode. This feeling arises from an inability to make affective contact with the patient; the interaction patterns that normally allow us to responsively connect with others are lacking or disrupted, and the entire qualitative structure of the exchange is somehow “off”. For Rümke, this praecox feeling is diagnostically significant. Whereas delusions, hallucinations, and severe disruptions of thought and behavior are readily-discernible symptoms of later-developing “active” or psychotic phases, prodromal symptoms tend to be subtler and more nonspecific (Yung et al. 1996). However, a skilled psychiatrist—guided by this praecox feeling—may pick up on them and use them as a heuristic for diagnostic judgments.

Rümke describes the praecox-feeling this way:

Often the praecoxfeeling is felt even before one has spoken to the patient; the condition is recognized by mere observation of body-posture, facial expression, motor behavior — the whole of the person’s expressivity. It is intuitively felt that all these are disturbed, i.e., changed with respect to the norm (Rümke 1948/2012, my emphasis).

For Rümke, features of the individual’s developing schizophrenia are perceptually available in the character of their embodiment and the (disturbed) quality of their interpersonal engagements—their prodromal FV. Seeing these features are what gives rise to the praecox “feeling, induced in the clinician, [that] is the final and most important guideline” when it comes to early diagnosis (ibid., p. 194).

This is accomplished not just by becoming perceptually attuned to surface features of the patient’s facial expressions, gestures, postures, and movements but also, crucially, to the temporal and expressive dynamics of their overall style. Prodromal FV are specified by distinctive kinematic features (e.g., their “stiff, bizarre, ceremonious” character) as well as by their normative character—specifically, by the way they are consistently misaligned with physical, social, and symbolic features of the context in which they arise (“changed with respect to the norm”, as Rümke puts it). So, when Rümke speaks of the “queerness” of the “famous empty smile” of schizophrenic patients, for example, he describes perceiving an expression that not only has unusual kinematic features (stiffness, rigidity, ceremonial character held for an awkward length of time, etc.) but also one that is normatively insensitive to its context. The empty smile does not arise spontaneously, in the natural flow of conversation; it is not “geared toward the establishment of human contact” (ibid., p. 195). Instead, it is over-performed—the hyper-stylized expression of a “thoroughly isolated person” play-acting at being social and driven by a need to “re-establish contact” with others and the social world more generally (ibid., p. 195). Accordingly, this forced character is why, for Rümke, prodromal FV “lose their intentionality and mutuality”. These features of prodromal FV are perceptually available.

For our purposes, the salient point is that perception, working in tandem with the praecox feeling, can be a reliable diagnostic tool by providing direct access to prodromal features. This intuitive clinical approach is important for several reasons (Grube 2006). First, standardized classification criteria (e.g., DSM-V) are not always suitable to complex or unstable settings—e.g., emergency situations requiring a quick diagnosis in order to begin therapeutic intervention—and thus require additional supplementary heuristics, which DSP can provide. Second, a symptom-based diagnostic approach is on its own capable of providing sufficient diagnostic evidence. The felt quality of the patient’s inability to connect and share with others, which modulates the intensity of the praecox feeling, needs to be part of the broad array of indices used to make diagnostic decisions. And there is evidence that this intuitive approach is diagnostically successful. Grube’s (2006) study looked at 67 previously unknown patients, all of whom displayed acute symptoms belonging to the schizophrenic spectrum. Compared with standardized diagnostic classification, the felt intensity of the praecox feeling was found to be a remarkably accurate diagnostic indicator of schizophrenia (see Varga 2013 and Moskalewicz et al. 2018 for further discussion).

These considerations support Ingerslev and Legrand’s (2017) recent call for the clinical encounter to be oriented around what they term a “responsive stance”. From this stance, bodily manifestations of psychopathology—concretely manifest in, e.g., ASD or prodromal FV—are “not reduced to signs of organic dysfunction, nor to psychosomatic manifestation, which should be turned into words” (ibid., p. 64). This move presupposes that such FV are merely “observable secondary effects of [the patient’s] inner mental states”, the hidden psychic dimension of the disorder that is ultimately the target explanandum (ibid., p. 66). Instead, expressive kinematics should be seen as “modes of bodily speaking”, as they put it, which give us direct perceptual access to features of the individual’s disordered condition. Attending to these embodied and expressive features, along with the qualitative character of the clinical encounter considered as a whole, should therefore be part of the diagnostic and therapeutic process.Footnote 11

To conclude, I’ve considered some further consequences of ongoing debates about enactive approaches to DSP. I’ve argued that if DSP is true, we can probably also perceive features of mental disorders as well. This is because some of these features are embodied in particular sorts of ways and thus fall within the scope of visual content. I’ve argued further that an enactive-inspired defense of DSP can, in this context, help clarify some ways we play a regulative role in shaping the temporal and phenomenal character of the disorder in question, and thus may have practical significance for both the clinical and therapeutic encounter. This is not to suggest, of course, that mental disorders cannot have hidden neurophysiological and phenomenological parts; nor am I suggesting that this view applies equally to all forms of mental disorders. The landscape of psychopathology is far too complex and varied. However, what this view does remind us is that, at least sometimes, others’ disordered cognitive and affective states are part of our shared perceptual world—directly accessible in many of the ways the non-disordered mental states of others often are, and thus amenable to similar degrees of empathy and understanding.Footnote 12