The Dialectic of Internalization/Externalization: Insights from Evolutionary & Developmental Psychology, Neuroscience & Psychiatry
In the following section, we selectively review results and insights from different disciplines in order to add empirical findings to the argument that the self can be regarded as a (historical or developmental) dialectic of internalization/externalization over multiple scales.
Across an evolutionary scale, the change to upright position comprises perhaps one of the most important qualitative leaps. In fact, bipedal walking has been crucial to the evolution of the self for various reasons. Perhaps, most importantly, walking on two feet allowed the development and use of sophisticated tools. The latter revolutionized the way humans adapt to the environment, allowing them to actively and dialectically transform the world they inhabit according to their needs. That is, it is not only humans who change the environment, but the environment in turn changes them in face of their impact on it (cf. Levins and Lewontin 1985). In brief, contrary to a perhaps common belief, humans (and other organisms) do not evolve via passive adaptation, but they fundamentally change themselves via socioculturally mediated transformations of the environment. However, having said that, this development has not come without compromises.
It has been hypothesised that bipedal walking has imposed certain constraints on the birth canal, which does not allow the birth of a fetus much older (and thus bigger) than 9 months. Additionally, according to the ‘metabolic crossover hypothesis’ (Ellison 2001; Dunsworth et al. 2012) the mother may not be able to support an older and more energetically demanding fetus. Consequently, while apes and other animals quickly master basic skills that grant them relatively early independency after birth, human infants are born unable to survive on their own. Indeed, the brain size of newly born infants is only a quarter of its fully developed size. This means that major development occurs after birth in direct interaction with the environment and others: “Maybe human newborns are adapted to soaking up all this cultural stuff and maybe being born earlier lets you do this […] Maybe being born earlier is better if you’re a cultural animal” (Karen Rosenberg on Adolf Portman; cited in Wong 2012). Such a compromise between early independency and optimal development might actually, in and of itself, define the timing of birth.
Another major evolutionary leap with regards to human cognition is the change from individual to shared intentionality (Vygotsky 1930–1935/1978; Tomasello 1999, 2014; Tomasello et al. 2005; Tomasello and Carpenter 2007), which can be broken down to more intermediate leaps (e.g. from individual to joint and from joint to collective intentionality; Tomasello 2014). The question here is: How did we go from relatively competitive great ape societies to (possibly) cooperative human cultures? It might have been a huge leap if there had not been an intermediate link between our common ancestor and humans. The needs for cooperation (e.g. for foraging) in the early human societies may have led to the transformation of individual to joint intentionality, involving two (or a small number of) individuals (Tomasello 2014). According to this hypothesis, this development has allowed for the coordination of roles and perspectives toward joint objectives, resulting in new forms of perspectival and symbolic representations, socially recursive inference and self-monitoring (regulating one’s own actions from the perspective of a cooperative partner). The practical need for coordination might have actually prompted the development of bodily structures, which subsequently supported more abstracted cognitive functions beyond the ‘here and now’. One tempting line of thought here would be to consider human body (e.g., eye and face) and brain evolution as reciprocally driven in the context of collaborative social interaction (cf. Kobayashi and Kohshima 2001; Powell et al. 2010; Dobson 2012). From a Bayesian perspective, ascending the hierarchy of a neural network, information gets more and more abstracted (e.g., from dealing with the probability of an event, to dealing with volatility, volatility of volatility and so forth; cf. Mathys et al. 2011). Taken together, we hypothesize that such a kind of evolution, which have allowed for abstracting beyond the concreteness of real-time social interactions, might have been toward the direction of extended bodily hierarchies.
Similarly to development at the scale of phylogeny, development at the scale of ontogeny can be also thought of as unfolding in socioculturally mediated interaction with the environment and others, undergoing a series of qualitative leaps along the lifespan (e.g., from individual to collective intentionality; Tomasello and Rakoczy 2003). More concretely, the acquisition of language, which can be considered as a particularly transformative leap for social cognition and interaction, is thought to emerge out of various pre-speech communicative acts (cf. Bruner 1974). An initial basic form of dyadic interaction (between the infant and the caregiver) could serve as the substratum for the development of joint attention, as well as more complex forms of interaction. For instance, dyadic (face-to-face) and triadic (including an object) interactions have found to be developmentally linked (Striano and Rochat 1999). Furthermore, joint attention, which is observed before fully developed social-cognitive awareness (Brooks and Meltzoff 2005), can predict future linguistic ability (Morales et al. 2000; Mundy et al. 2007). Additionally, maternal sensitivity (Hobson et al. 2004) and synchronicity (Carpenter et al. 1998) have found to correlate with infants’ propensity to engage in social interactions and language development respectively.
Also in so-called psychiatric disorders, here thought of as disorders of social interaction or cases of so-called atypical social interaction, we find an interrelation between the manifestation of the organic condition and interpersonal difficulties (Vygotsky 1930–1935/1978; Schilbach 2016; Bolis et al. 2017). When it comes to autism, synchronicity in earlier play interactions between the child and the caregiver was found to correlate with the development of subsequent communicative forms, such as joint attention and language (Siller and Sigman 2002). In fact, it has been suggested that autism can be viewed not as a mere brain disorder, but rather as an evolving interpersonal misattunement encompassing various levels of description (Bolis et al. 2017). An attunement between the child and the caregiver along development is crucial in language acquisition. Yet, even when an autistic individual becomes able to talk, in most of the cases they achieve a propositional attunement (knowing that), as opposed to a pragmatic attunement (knowing how), a fact which largely prevents an intuitive participation in interactions with others. This alone, we suggest, might have direct implications in the formation of the self in autism due to the crucial dialectical nature of language.
Our discussion on tool mediated evolution holds also for individual development: language can be viewed as a communicative tool used for transforming the (social) world, but also the self itself (Vygotsky 1934/1962). This dialectical nature of language becomes evident when examining its dual role, in speech (interpersonal) and thought (intrapersonal), which should be thought in unity, rather than in external (even tight) relation (Vygotsky 1934/1962). In other words, contrary to a common assumption that speech is merely an enacted thought, speech and thought unfold together, inextricably entangled. Indeed, recent evidence demonstrates neural coupling during production and comprehension of real-life speech (Silbert et al. 2014). Importantly, the interpersonal aspect of language should be still thought of as temporally and conceptually preceding the intrapersonal one. That is, in contrast to a Piagetian perspective, we adhere to the Vygotskian idea that it is social interaction that drives development and not vice versa.
In sum, basic forms of interpersonal sensorimotor contingencies gradually evolve into more complex forms of interactions, such as joint attention and multi-person interactions. This kind of initial social interactions might be exactly what (reciprocally) drives development of social cognition for dealing with beyond ‘the now and here’ (cf. Theory of Mind; Baron-Cohen 1991; Tomasello 1995). At the neural level, it has been suggested that joint attention might be the outcome of two interacting systems, namely the posterior and the anterior attention system (Mundy and Newell 2007). The posterior system, which is relatively involuntary and common to many primates, begins to develop during first months of life and can be, simply speaking, thought of as serving for an understanding of “where others’ eyes go, their behaviour follows” (Jellema et al. 2000; Mundy and Newell 2007). The anterior system, which is considered volitional and goal-directed, develops later and can be, along similar lines, thought of as serving an understanding of “where my eye’s go, my behaviours follows” (Mundy and Newell 2007). We take this as suggestive of a claim that the ‘self’ develops tightly connected to the understanding of the ‘other’ and that in fact the latter proceeds.
It might actually be the case that it is exactly in our effort to understand others that we develop an understanding of ourselves. Here, three tangled modeling loops are considered: (i) the inner loop, dealing with the prediction of internal bodily processes (cf. interoception), (ii) the perception–action loop, which involves the anticipation of the consequences of one’s actions on the world and (iii) the self-other loop, which deals with modelling other minds (Timmermans et al. 2012). Exactly the latter loop, through social interactions, might be what ontogenetically forge sophisticated bodily structures that are later deployed for reflective social cognition (e.g., Theory of Mind; Schilbach et al. 2010, 2013; Frith and Frith 2012), via neural reuse (Anderson 2010). There is empirical evidence suggesting that unconstrained cognition, emotional processing and social cognition might all share common neural networks in the dorso-medial prefrontal cortex and in the precuneus (Schilbach et al. 2012). Interestingly, the latter brain networks partially comprise the Default Mode Network, which is putatively activated more when a person does not directly focus on the outer world. Such a neural overlap between ‘social cognition’ and ‘introspection’ can be taken to suggest that not only thinking about others (either implicitly or explicitly), but even thinking about ourselves is driven by social interactions.
Taken together, we construe the self as a historical process of dialectical attunement unfolding over various time scales (Fig. 2). More concretely, we view two cardinal groups of processes dialectically interconnected, namely internalization and externalization. These processes are thought of unfolding along different time scales, e.g., (i) in the time frame of evolution, involving genetic and environmental adaptations, (ii) across generations, as cultural practices, or (iii) during individual development, including bodily and world reconfigurations, such as perception, action and learning. Put simply we view both low- and high-level attunement. Low-level attunement emerges during collective behaviour, when people are coupled together or when they coordinate (cf. De Jaegher and Di Paolo 2007). However, while people interact, and thus act and perceive each other, they mutually co-construct internal models across multiple levels of bodily hierarchies. As we saw before, the construction of such hierarchies allows for consideration of increasingly higher levels of abstraction and thus temporal scales. That means that people in social interactions co-construct each other not only in the ‘here and now’, but also beyond, via co-configuring higher-level abstracted beliefs and patterns of action, on hand in future instances across a variety of interactive contexts or privately (cf. Theory of Mind; Fig. 2). Simply speaking, poetry (from the Greek “poiesis”, literally meaning “making”) can be thought of as an active externalization of internalized social interactions.
Internalization is the set of processes via which the structure of the environment (e.g., social relations) is actively transformed and implemented within an individual. From a Bayesian perspective, internalization entails the creation and maintenance of dynamic hierarchical models of the world in an effort to effectively predict future changes and act accordingly. We consider internalization as being accomplished across various time scales, from genetic information encoding and cultural adaptation, to bodily reconfiguration across development and real-time perception. For instance, in the evolutionary scale, the human visual system is attuned to the peak of the solar radiation spectrum that reaches the surface of the earth. In other words, human species has bodily internalized the environment in terms of electromagnetic conditions. Interestingly, similar attunement to environmental condition is also observed along developmental scales. For instance, experiments have demonstrated that extreme exposure to a restricted range of visual stimuli (e.g., exclusively vertical visual orientation), early in development, modifies the morphology of neurons in visual cortex accordingly (e.g., Tieman and Hirsch 1982). Furthermore, with regards to shorter time scales, perception and action can be seen as real-time bodily attunement to the environment. Finally, undeniably people are also culturally attuned in multiple aspects. For instance, what is considered beautiful or delicious seems to be different across sociocultural contexts, both across time and space.
In fact, humans used cultural models for describing, predicting and manipulating the environment already in the cradles of civilization. For instance, ancient societies have construed natural phenomena, such as weather or earthquakes, as behavioural expressions of personified deities. At first sight, this might appear as a rather naive approach. However, we consider this as an ingenious tactic that might have allowed pre-scientific communities to recruit powerful cognitive capacities, originally developed for dealing with the undoubtedly complex social realm. Any level of abstraction can be considered as a model of the world. To come back to the example of language, a word can be thought of as a sociocultural model in and of itself, which of course presupposes the evolution of both the necessary biological apparatus across evolution and an interpersonal attunement across development. For instance, the word ‘animal’ or ‘wave’ practically captures and summarizes higher level similarities being met in a plethora of diverse natural processes. Here, we should stress that we do not consider the construction of internal models as a passive accumulation of representations.
The construction of internal models allows not only for the prediction of the world, but also the (socioculturally) transformation of it for meeting survival needs, through collective externalization. In other words, dialectical attunement does not merely imply a single-sided adjustment of the individual into the environment, but also transforming thereof across multiple scales: from cooking food, building shelters and developing technology, to transforming social structures and domesticating animal species. The activity of an individual in everyday life is decisively modulated by evolutionary, cultural and developmental factors. For instance, the use of a tool is defined by human anatomy, accumulated collective knowledge and individual learning. As discussed above, though, a change of the environment inherently entails a reconfiguration of the self as well. Externalization directly impacts on internalized models (cf. the interplay between active inference and predictive coding), as well as indirectly via the feedback of a transformed world. For example, learning to use a tool is fundamentally different when it is enacted rather than being merely theoretical, even though in both situations an internal model is developed. Additionally, both mechanical and conceptual tools (see the example of ‘wave’ from above) have helped the construction of modern technology, which in turn continuously modifies humans in multiple aspects and scales (from everyday behaviour to cultural habits and genetics in the long run). Crucially, when it comes to humans, transforming the world is fundamentally social, both with regards to our impact on others and the environment: the former is inherently social, while the latter becomes such via the mediation of sociocultural tools. In sum, we view the self exactly as the dialectic of the abovementioned internalization and externalization processes.
We will come back to this point and its scientific and societal implications during our concluding remarks (Sect. 2.3), after first describing how our hypotheses could be put to the test scientifically. To this end, we will describe experimental and data analytic means for studying the dialectic of internalization and externalization in real-time social interactions and beyond.
Two-Person Psychophysiology & Multi-level Accounts of Intersubjectivity
Due to conceptual and methodological constraints, research has largely focused on either intrapersonal (e.g. neurobiological and psychological), or interpersonal (e.g. socio-cultural) processes. Here we emphasize the importance of studying intrapersonal and interpersonal processes in their inherent interrelation, as they unfold during social interactions. In what follows, we describe an experimental framework, namely two-person psychophysiology and an analysis scheme, namely multi-level analysis of intersubjectivity that could help us do so.
Two-person psychophysiology appears as a promising avenue for empirical research, which while offering great experimental control, also preserves adequate degrees of ecological validity (Bolis and Schilbach 2017a, b). Traditionally, psychophysiology has enabled the empirical investigation of the relation between physiological and psychological processes (e.g., through physiological monitoring and introspection), offering important insights about individual mechanisms. However informative this kind of approach may have been, the concept of the (a-)typical ‘self’ will remain largely misconstrued until dynamic interpersonal processes are systematically considered, as social cognition might be fundamentally different when we are in interaction with others rather than merely observing them (Schilbach et al. 2013). It has been argued that the most important experience of the other comes from face-to-face situations; that this is the archetypic situation of social interaction, while all other situations are products of it (Berger and Luckmann 1967). It is exactly in this kind of situation that the ‘here and now’ of each other’s subjectivity come together and possibly form an inextricable intersubjective unity (Berger and Luckmann 1967; De Jaegher and Di Paolo 2007; Bolis and Schilbach 2017b).
Building upon empirical frameworks of interpersonal research (e.g. Read Montague et al. 2001; Schilbach et al. 2006; Dumas et al. 2010; Barišic et al. 2013; Froese et al. 2015; Koike et al. 2016; Liu et al. 2016), two-person psychophysiology crucially allows for the empirical investigation and systematic manipulation of face-to-face social interaction, across various modalities and temporal scales. In such a framework (Bolis and Schilbach 2017b), participants sit opposite each other, working on tasks either individually or collectively, while being able to interact, either in real-time or offline, through a micro-camera communication system. Such a two-person framework allows for systematic control and monitoring of processes that live in different levels of description, from (epi-)genetics and culture to interpersonal behaviour and psychophysiology. In fact, via controlling the synchronicity of social interaction and composition of dyads, cardinal aspects of the self can be put into scientific test: Emerging contextual and interpersonal differences in social interactions might prove equally, or even more important than individual traits in defining the becoming of the (a-)typical self (Bolis et al. 2017).
Interpersonal frameworks for empirical research might be an important tool for moving beyond the individual as the unit of analysis, yet not sufficient on their own. Conceptual and experimental practices should be developed hand-in-hand with methods of analysis (e.g. Bahrami et al. 2010; Konvalinka and Roepstorff 2012; Schilbach et al. 2013; Abney et al. 2014; Dumas et al. 2014; Froese et al. 2015; Friston and Frith 2015; Zapata-Fonseca et al. 2016; Fusaroli and Tylén 2016; Sevgi et al. 2016; Bolis and Schilbach 2017a). Here, we suggest a shift from an exclusive focus on the (Bayesian) brain in isolation, toward a multilevel understanding of intersubjectivity and psychopathology. In this framework of analysis, principled accounts of brain function (e.g. predictive processing) are employed for describing crucial neurobiological mechanisms, while being connected to real-life phenomena, which by definition live in an interpersonal space. More concretely, grounded in established models (e.g., Daunizeau et al. 2010; Mathys et al. 2011; Bolis et al. 2015), a two-level modelling scheme could be used for capturing both individual processes (Bayesian level) and collective behaviour (meta-Bayesian level). Put simply, in this scheme intrasubjective parameters will be deployed for capturing individual mechanisms (e.g., neuromodulation), while intersubjective ones to describe emergent processes on the collective level (e.g., interpersonal coupling). Collective parameters refer to sociocultural tools, such as artefacts, communication mediating factors, and generally any co-constructed and commonly held convention. For instance, the efficacy of a communication channel might strongly modulate interpersonal coupling in social interaction (Bolis and Schilbach 2017b).
Such an intersubjective scheme could be exploited for considering emergent phenomena on higher levels of description, such as for instance questions about the autonomy of a dyad or a group of people. To give a more specific example, in the context of collective externalization a non-linear model might explain observed behaviour optimally, thus, providing evidence that the group is different than the sum of individuals. Inversely, this framework could address questions about how collective processes, in turn, shape individual reality. For instance, one could differentially study the potentially distinct impact that a competitive or individualistic versus a collaborative structure might exert upon an individual (Bolis et al. 2017). Collective activity and societal structure are thought of being capable in shaping individual levels (from neurobiology to phenomenology) via internalizing mechanisms. In other words, it is not only lower-level mechanisms that result in emergent collective ones, but internal processes are treated, here, as dynamically internalized interpersonal processes.
Notably, a meta-Bayesian framework can consider observable activity in any level of description, such as neural activity, motor responses or collective behaviour. With regard to social interactions, an interesting avenue for future research might involve studying whether interpersonal coordination on the behavioural level might actually, serve as a prior and modulate, or even relax, the need for inferences about the hidden causes of social behaviour. Furthermore, at a neurobiological level, we hypothesize that activity of different neuromodulators could be related to a subject’s ability of tracking different levels of interpersonal regularities. In short, a Bayesian account of intersubjectivity intends to offer a principled and quantitative description of the dialectic between internalizing and externalizing processes across different levels of description, as discussed above.
The Dialectical Self: Scientific and Societal Relevance
Our approach shares common ground and most importantly brings together under a dialectical umbrella two seemingly disparate perspectives, i.e., interactionist-enactivist (e.g. Maturana and Varela 1980; De Jaegher and Di Paolo 2007) and computational-Bayesian accounts of cognition (e.g. Clark 2013; Friston 2013). Enactivist accounts have constructively put their focus on the fundamental role of interaction and coupling with the environment, including others. Bayesian accounts of cognition have provided important computational tools for describing individual cognition, mainly through hierarchical models. Our dialectical suggestion, on one hand emphasizes the primacy of (social) interactions. More concretely, it states that for a comprehensive understanding of the (a-)typical self, we will need to move beyond the individual, to the historical unfolding of (social) interactions over multiple scales. On the other hand, our approach extends Bayesian accounts of cognition by situating them in the context of real-time social interaction and providing a description of internalization and collective externalization processes beyond the individual. More precisely, it connects internalization to predictive coding and collective externalization to active inference. By doing so, it describes perception, learning and collective action as a unified process that allows for aligning personal (psychophysiological) and interpersonal (coupling and synchrony) states with environmental (nature and others) conditions. Taken together, via integrating levels of description and time scales such an approach provides a unifying and principled way for studying the self beyond the individual.
In this article we have described the self as the dialectic of internalization and externalization and more concretely as a historical product of dialectical attunement over various temporal scales (see Fig. 2). According to this view, low-level attunement is achieved largely automatically (beyond awareness) during embodied interactions, via mechanisms of collective externalization. High-level attunement is achieved through mechanisms of internalization. For instance, low-level attunement captures human action as an emergent collective phenomenon (cf. interpersonal bodily coupling, coordination and synergy) in the ‘here and now’. High-level attunement captures human mind as an active environmental reflection. In a cultural frame, this takes the form of internalized values and conventions in a society, generalized across multiple temporal and contextual frames. In sum, low- and high-level attunement are dynamically and cumulatively interrelated, via internalization and collective externalization processes, forming the dialectical self.
Yet still one might wonder why even question the question of the self. We believe that any thesis on the self is inherently implicated in numerous fields of science and the society. A dialectical perspective, as the one described here, points toward specific directions that acknowledge the primacy of the social, without neglecting the importance of the individual in their interrelation, co-construction and tension. Additionally, it points toward the necessity of adopting an empirical and principled approach to studying the self. To this end, formal approaches of predictive processing and dynamical systems appear as most promising. Approaching the formation of the self under the unifying umbrella of the dialectic of internalization/externalization might allow formal integration and re-description of seemingly disparate mechanisms across different scales. Yet the implications of such a dialectical approach reach further than the realm of scientific research.
In pedagogy, this is translated into an educational system that would promote collective problem solving as compared to mainstream competitive individual tests. Put simply, taking such an approach seriously, it would make no sense to isolate inherently limited individual cognitive capacity and reward merely the most relevant to a given task. On the contrary, promoting collective problem solving and decision making via active participation and interaction would enhance both cognitive and motivational aspects, yielding superior pedagogical but also practical achievements. In psychiatry, one would not be merely focused on diagnosing and ‘fixing’ individual impairments, but also tuning interpersonal communication and enhancing social inclusion (Fig. 3; Bolis et al. 2017). Within a clinical context, such an approach would suggest the monitoring of not only individual progress, but also interpersonal coupling between a ‘therapist’ and the ‘individual’, as well as between multiple persons during group therapy. In fact, not every therapist might be optimally suited for every patient and therefore matching of therapist and patient might need to be assessed in order to predict whether therapy will eventually work. Within a societal context, ‘tuning’ will not target only the individual with a psychiatric condition, but also her social environment. For instance, anti-stigma and informational campaigns will target tuning of social expectations of others as well, effectively resulting in a reciprocal amelioration of existing interpersonal misattunement. Such developments might help bring a redefinition of what a psychiatric disorder is, situating it back into the social realm within which it emerges.
In the field of ethics and law, seriously assimilating the idea that the self goes beyond the static individual, a juridical system would not only focus on individual intentionality and responsibility, but also take into account collective factors and societal structure. Along similar lines, confronting social problems such as racism will not merely address educating individuals, but also dealing with social structures, which potentially instigate and maintain such patterns of behavior. Finally, such a perspective would suggest developing artificial intelligence and robotics, not via static pre-configuration, but via allowing interaction for co-constructing and internalizing knowledge. This should be expected to yield not only more robust artificial systems, but insightful conclusions on cardinal questions about human cognition as well. More concretely, in line with cultural historical and enactivist perspectives, we suggest that the role of social interaction and active participation in the co-construction of a culturally shaped self should be taken more seriously, in both research and social practice, as paraphrasing Descartes: ‘we interact, therefore I become’, or put simply ‘I interact, therefore I am’.