I engage phenomenological and empirical perspectives on dialogical relations in infancy in a mutually enlightening and challenging relation. On the one hand, the empirical contributions provide evidence for the primacy of first-to-second person interrelatedness in human sociality, as opposed to the claim of primary syncretism heralded by Merleau-Ponty, and also in distinction from the ego-alter ego model routinely used in phenomenology. On the other hand, phenomenological considerations regarding the lived affective experience of dialogical relatedness enrich and render intelligible the psychological accounts of dialogue in terms of observable behavior. Phenomenological and empirical perspectives on dialogical relatedness thus combine to offer an affectively charged and conversationally patterned notion of primary intersubjectivity in the I-you mode.


1 Introduction

The interdisciplinary ambitions of phenomenology and the cognitive sciences are probably best pursued within the context of Merleau-Ponty’s philosophy. Throughout his investigations into the embodied, situated existence, Merleau-Ponty adopted a deliberately multidisciplinary perspective that anticipated the recent scholarly trends to combine reflective and empirical approaches. I believe that Merleau-Ponty would be pleased to witness this recent revival of research at the interface of phenomenology and the cognitive sciences, because it testifies to the philosopher’s own commitment to engaging first- and third-person perspectives on human being in a mutually enriching dialogue. Merleau-Ponty consistently argued that phenomenology needs to transcend itself and work at its limit if it is to remain a “concrete philosophy.”1 Phenomenology can only abandon its idealist pretensions if it complements first-person reflection on experience with third-person observation, or better still, if it revises the conceptual framework according to which there is philosophical consciousness on the one hand, and rich but blind empirical knowledge on the other.2 An interdisciplinary project must seek to infuse empirical data with meaning via trained philosophical reflection, as much as it needs to root consciousness in body and nature. This means that phenomenology needs to be informed and possibly challenged by the developments in science, but it means also that phenomenology should have a say in how scientific experiments are designed and what theoretical frameworks are being used to interpret the findings gathered in the laboratory.

Merleau-Ponty’s remarks about the need to combine and revise first- and third-person approaches are echoed in recent debates about what methodology to use in interdisciplinary research.3 I believe that the method of “neurophenomenology,” proposed by Varela,4 which stipulates that the disciplines based on first- and third-person methodologies should enter in a relation of mutual constraint and enlightenment, realizes Merleau-Ponty’s own vision, and is preferable to other existing methods, such as Daniel Dennett’s deflationary proposal of “heterophenomenology.”5 Neurophenomenology is especially productive in cases of conflict between views espoused by phenomenologists and natural scientists, in that it allows the disciplines to throw a critical light on each other and also to stimulate their respective developments.

In my paper, I will throw some light on how to pursue this interdisciplinary line of research by engaging Merleau-Ponty’s stance on intersubjectivity in a mutually constraining and enlightening exchange with recent developmental studies in psychology, notably the studies on emerging dialogicality in infant–mother interactions.6 These studies motivate a revision of the phenomenological view regarding the primacy of syncretism, and suggest the alternative view that dialogical relations are primary in human sociality. Such studies in developmental psychology serve, therefore, to constrain the dominant account of sociality given by genetic phenomenology, which describes it in terms of a fusional symbiosis between self and other, and provide evidence that self and other, while intimately interrelated, are non-identical from birth.7 I propose to discuss this interpersonal interrelatedness by focusing on the phenomenon of “conversational congruence” that is observable in typical face-to-face non-symbolic interactions between infant and mother (or other caregiver). In other words, I propose to examine Bateson’s and Trevarthen’s foundational concept of proto-conversation, which points to complex pre-linguistic fluencies in mother–infant interrelations that are carried over in later development to proto-linguistic aspects of adult conversation, and to pose it as an explicit alternative to Merleau-Ponty’s syncretism hypothesis.

At the same time, and in the service of a truly bidirectional interdisciplinary engagement between phenomenology and the cognitive sciences, I propose to offer phenomenological resources for interpreting as well as challenging the empirical perspectives on dialogical coordination between mothers and infants. The observable data on dialogicality identified by current empirical research on pre-verbal infancy need to be enriched by—and in fact remain indecipherable without—phenomenological methods and contributions (as difficult as obtaining phenomenological reports on infancy turns out to be). The observable phenomenon of conversational congruence from empirical studies needs therefore to be enlightened by the felt phenomenon of “good vibrations,” about which a descriptive first-person analysis of a well-flowing conversational interaction provides rich insight. The feeling of good vibrations, as experienced in a harmonized exchange with a conversational partner, is therefore an intrinsic element of dialogical competence. The third-person account of conversational congruence must be complemented by a descriptive first-person account of good vibrations if the phenomenon is to retain the meaning it bears within an affectively lived situation. Phenomenological resources are therefore requisite to provide a robust and complex story about what it means to dialogically engage with others.

2 From syncretism to dialogicality

Few phenomenological theories of intersubjectivity frame the question in explicitly genetic terms, i.e. in terms of the development of intersubjective relatedness from the earliest moments of human life. Merleau-Ponty’s account is a notable exception in this regard. He consulted the developmental psychological as well as the psychoanalytic and neurological contributions of his time with a view toward providing a rich multidisciplinary perspective on intersubjectivity. He was particularly influenced by Piaget’s claim of the infant’s initial adualism, i.e. the purported inability to differentiate between self and non-self due to the relative developmental immaturity of the infant. Piaget argued that the infant is unable to correlate the so-called visual and tactile-kinesthetic schemas, i.e., to intermodally link the visual information received from the outside with the tactile and kinesthetic sensations originating in her own body. That is why, the theory goes, the infant could not engage in face-to-face interactions with an adult, such as mimicry of simple facial expressions, for she is unable to connect the visually perceived expressions on the other’s face with her own proprioceptively felt movements. This denial of neonate imitation by psychologists received neurological backing from the contention, cited by Merleau-Ponty, that neurological pathways are incompletely myelinated at birth, hindering the infant’s proprioceptive awareness of her own body and thus excluding any sense of individuated selfhood. According to this account of development, the infant was thought to live in an anonymous, undifferentiated state. Variations of this view were shared by a wide range of theorists, including Guillaume8, Wallon9, Lacan10, as well as Merleau-Ponty, who embraced the views of many of these authors in his Sorbonne Lectures on child pedagogy and psychology. These views postulated the primacy of self-other confusion as the original “syncretic” state of human sociality. A classic example of this syncretism was located in “transitivism”: for example, in the so-called “emotional contagion” reported in neonate nurseries where the crying of one infant would spread to all the others in the vicinity, regardless of their prior emotional state. Merleau-Ponty agreed that “indistinction of the two personalities ... makes transitivism possible”11and he had valid reasons to believe that this indistinction followed from the developmental immaturity of the infant.

Over the last three decades, in many parts of the world, extensive experimental studies have been conducted which conclusively demonstrate that newborn babies are in fact able to imitate simple facial gestures of adults, such as tongue protrusion and mouth opening.12 Importantly, this facial mimicry does not operate in a reflex-like fashion where the visual stimulus would automatically elicit the infant’s mirroring response; rather it takes the form of a learning process in which the infant gradually approximates the perceived facial gesture in its own motor performance. The condition sine qua non of such gradual approximation of a visual model is that the infant be able to monitor and correct the gesture she performs by means of proprioceptive feedback from her own body. This process implies that the newborn baby possesses an innate body schema which facilitates a basic, rudimentary awareness of self as an organized embodied agent with a set of motor possibilities. Furthermore, the infant clearly registers the non-identity between her own proprioceptively felt gesture and the visually perceived gesture of the adult. In order to be able to grasp the other’s gesture as a model to be imitated, the infant must be aware of the distinction between what the other does from what the infant herself (feels that she) does. It can therefore be concluded that neonate imitation relies on a primitive sense of self and a minimal distinction between self and non-self.

As Gallagher and Meltzoff observe, Merleau-Ponty would most likely not have objected to the argument that imitation requires the correlated awareness of self and a distinction from non-self, and had he been exposed to the more up-to-date research, he most likely would have sided with the idea that intersubjective relatedness does not emerge out of initial fusion but is a primary given of human life, manifest from birth on. Elsewhere I have argued that such a revision of Merleau-Ponty’s views on intersubjectivity would have profound implications not just for his thinking about child psychology and pedagogy, but also for his later ontological philosophy of the flesh.13 This later ontology, which assigned a primacy to intersubjectivity over against Merleau-Ponty’s earlier commitment to perception, was significantly influenced in its key ontological notions, such as anonymous life, by the psychological hypotheses of a dualism and syncretism pervasive at Merleau-Ponty’s time. I argued that an ongoing interdisciplinary dialogue between phenomenology and the cognitive sciences, such as developmental psychology, is necessary to rejuvenate phenomenological research. This is meant in the sense of phenomenology being both informed by contributions from the empirical disciplines and serving as a valuable theoretical tool which can help with experimental design in empirical research and with the interpretation of the data by the scientist in the lab. In other words, the continued validity and relevance of phenomenological methods is best evidenced not only by a unilateral influence of empirical sciences upon phenomenology but also by a reverse relation of phenomenological insights shedding light on the research conducted in cognitive science.

Empirical research on the conversational or dialogical dynamic of interpersonal relations in infancy has not yet, to my knowledge, been examined in phenomenological accounts of intersubjectivity. This research investigates the partial overlaps between non-verbalized dialogue between an infant and her caregiver and the verbalized dialogue between competent mature speakers, and its relevance for developing a rich theory of intersubjectivity is manifold. The research addresses the temporal patterns of interpersonal interaction, it shows how both adult and infant actively influence the relational process even though they do not share the same degree of agency, and it may also shed light on the acquisition of language as a process emerging out of the earliest patterns of interpersonal interaction. I argue that the research also helps us thematize the phenomenon of affectivity in terms of perceptible “good vibrations” with others—a phenomenon which may assist the development of emotional and moral bonds of empathic understanding. These insights augment phenomenological conceptions of intersubjectivity as the fundamental and primary condition of human life, in agreement with some of Merleau-Ponty’s14 and Husserl’s15 remarks about the primacy of intersubjectivity. However, they also demonstrate that intersubjective relatedness is a process that needs to be cast in explicitly dialogical terms, as an interactive, reciprocal first-to-second person exchange. They resist, therefore, the traditional phenomenological terminology of the ego and the alter ego that is routinely used in reference to intersubjectivity. Rather than indicating a dialogical relation, this terminology suggests a relation between subjects that are confined to a mentalistic, first-person mode of experience and that relate to others only through a detached mode of non-participatory observation. As I have argued elsewhere, the ego–alter ego model fails to capture, and effectively covers over, the first-to-second person relatedness of dialogical interaction. It therefore needs to be displaced in favor of the thesis of primary I-you connectedness if phenomenological accounts are to do justice to the lived reality of social life.16

Colwyn Trevarthen17 is one of the pioneers of research about the earliest manifestations of interpersonal interaction, and the dialogical character of this process. He is one of the key defendants of the view that the infant is literally born into intersubjectivity. Basing his analysis on microanalyses of filmed interactions between infants and their mothers, he showed that the movements of eyes, hands and mouth are rhythmically timed in a turn-taking form with the adult. Involving vocalization, touch, and gaze, these rhythmically timed non-verbal exchanges between infant and caregiver follow a give-and-take, address-and-reply pattern in which both partners attend to one another and mutually coordinate their actions. For this reason, Trevarthen viewed these exchanges as being of a conversational type and termed them, following Mary Bateson,18proto-conversations. Even though the mother initiates and supports the exchange, the infant is actively involved in this communicative process, which belongs to, what Trevarthen calls, our primary intersubjectivity. This view argues that, beginning at birth, the infant’s interactions with others are characterized by a degree of decentralization and attunement to other perspectives. Importantly, this research demonstrates that the earliest interpersonal exchanges bear a number of structural similarities to adult conversations.

According to recent research, face-to-face interactions in infancy follow dialogical rhythms that resemble the rhythmic patterns of verbalized conversations in adulthood. For example, infants and their mothers alternate between vocalization and silence or looking and looking away in a recurrent and non-random manner (albeit not in a manner that is strictly regular or periodic, such as a heartbeat or marching beat). The participants in a dyadic exchange mutually influence each other’s cycles of activity and receptivity. A dialogical or conversational competence of a non-symbolic type, which precedes language-based dialogue, can thus be seen to operate from the earliest moments of human life. Some elements of these early non-symbolic dialogical rhythms, such as gaze patterns, remain operative within adult dialogue. Infants are therefore partially skilled in areas of communication that belong to adult dialogical repertoire.

Consider the various features of infant dialogical exchanges. The earliest interactions exhibit a degree of mutual influence between infant and adult. They are context sensitive. They follow a turn-taking pattern, where both participants occupy the interchangeable roles of agent and recipient of action in a non-random sequence. As such, they bear resemblance to the verbalized exchanges between two interlocutors who alternate between the active/speaking and the receptive/listening roles. They exhibit a face-to-face orientation, which is typically preserved within dialogue in adulthood. Combined, these factors provide strong evidence that verbalized interaction between speaker and addressee emerges out of the earliest pre-linguistic self-other relations in infancy. This line of thought accords with that of Lyons who argued that languages have “developed for [the purpose of] communication in face-to-face interaction.”19 The developmental perspective helps explicate in some detail how and why the earliest face-to-face interactions may set the stage for the mastery of symbolic communication.

The notion of dialogical relations in infancy has become relatively well established in psychology. Spitz introduced the notion of mother–infant dialogue in the context of psychoanalytic theory. He argued that reciprocal exchanges between mother and infant were crucial to developing the feeling of being responded to and a sense of identity.20 Together with the spread of developmental research in the 1970s, the study of dialogical relations involving infants significantly increased. Numerous researchers studied mother–infant interaction in diverse sense modalities, including gaze, vocalization, gesture, and characterized it as a conversation, proto-conversation, or dialogue.21 Some researchers used dialogue as an all-inclusive metaphor for infant–mother temporally patterned transactions, including for example, sucking.22

In the face of such a wealth of empirical data, it is worthwhile to consider the conceptual requirements for terming an interpersonal transaction dialogical or conversational. One critical requirement is the structure of alternation between sound and silence. Consider that conversing adults can only speak one at a time for the dialogue to unfold smoothly and for the interlocutors to follow what the other is saying. Speaking and hearing is, so to speak, a single track phenomenon where only one vocal stream can flow within a given unit of time. To be sure, we hear ourselves as we speak, but we cannot emit speech and process speech emitted by our interlocutor simultaneously. That is why verbalized dialogue has a sequential rather than a simultaneous structure: it involves ongoing alternations between conversational turns held by the speakers in view of shared understanding. An interpersonal exchange involving infants will count as dialogical or conversational if it involves such an alternating turn-taking pattern between sound and silence. Infants who engage in such a turn-taking exchange can be said to partake in the fundamental temporal structure of dialogue, even though the vocalizations they emit and hear are of a pre-symbolic type. Their interactions with others may in this way count as instances of proto-conversation—alternating vocal exchanges which may serve to create social bonds.

3 Dialogical synchronization in adult conversation

Proto-conversational abilities in infancy indicate that the rhythms of dialogue cut across preverbal and verbal stages in social development and are characteristic of the temporal patterning of face-to-face interactions both in infancy and adulthood. Unsurprisingly then, the model of temporal coordination in adult dialogue has been successfully used in the study of infant communication. Jaffe and Feldstein23 devised a dialogical model that they claim provides an exhaustive classification of everything that can possibly happen in a two-person vocal exchange. Note however that it is a pragmatic rather than a semantic model—it is indifferent to the content of the conversation and limited to recording the patterns of sound and silence in the turn-taking interaction. The temporal flow of a dialogue between two adult speakers is parsed into five parameters called vocal states: vocalization, pause, switching pause, and interruptive and noninterruptive simultaneous speech. Importantly, these vocal states, which are discernible in adult conversation, can also be seen at work in mother–infant conversational exchanges, as I will discuss below.

Consider the five vocal states in more detail.24 A vocalization is a continuous utterance of one individual. It may contain a silence no greater than 250 ms (silences less than 250 ms are attributable to stop consonants in speech). A pause is a joint silence greater than or equal to 250 ms which occurs within the speakers turn (the turn begins at the moment either interlocutor vocalizes alone and is held until the other vocalizes alone, at which point the turn is exchanged). A switching pause is a joint silence greater than or equal to 250 ms initiated by the turn holder, but terminated by a unilateral vocalization of the partner, who thereby gains the turn. Jaffe and Feldstein assign the switching pause to the speaker whose turn it terminates. Finally, there are two types of simultaneous speech, both initiated by the partner who does not hold the turn (the listener or addressee). Non-interruptive simultaneous speech is that which begins and ends while the partner who holds the turn continues to vocalize. Interruptive simultaneous speech is initiated by the listener or addressee while the turn holder is vocalizing but then continues after the turn holder falls silent.

Note that the parsing of dialogue into the vocal states (vocalization, pauses, and simultaneous speech) is based not only on the active and receptive values of the vocal stream, i.e. on whether either interlocutor is emitting sound or keeping silent. It also takes into account the turn-taking character of the dialogue, i.e., it follows the so-called turn rule. The turn rule stipulates that a turn begins at the instant either interlocutor vocalizes alone, and is held until the other vocalizes alone (the turn is then exchanged). The turn can therefore be a composite of some or most vocal states, for example a sequence of vocalizations and pauses, as well as instances of non-interruptive simultaneous speech. The switching pauses and the interruptive simultaneous speech mark the end of the speaker’s turn who thus yields the floor to her partner. A combination of two turns yields a cycle that it termed the interpersonal turn rhythm.25

It goes without saying that the Jaffe–Feldstein model is limited by its focus on recordable vocal behavior (sound and silence). One could object that there is more to a human conversation than what this model is able to capture. For example, listening and paying attention to what is being said is a vital component of a well jointed dialogical exchange. A dialogue is not simply a sequence of individual monologues that happen to be timed in an interlacing pattern. It is not just an on and off cycle of sound and silence, of activity on the speaker’s side and passivity on the listener’s side. Listening is also a way of being active by attending to the speaker’s words, and the speaker is typically aware of her interlocutor’s attention (or lack thereof) to what she says. The Jaffe–Feldstein model, limited as it is to recording vocal behavior, cannot capture these mutual attentional states. Note also that due to its rigid turn rule, which consistently ascribes a piece of behavior to an individual interlocutor, the Jaffe–Feldstein model leaves out the elements of dialogical exchange that do not follow a strictly sequential order but afford simultaneity. For example, eye contact or smile sharing do not follow a turn-taking pattern but occur at one time in vocal exchanges in the face-to-face mode. Such cases of simultaneous engagement through the gaze and facial expression testify to the interlocutors mutually attending to one another and are therefore an integral part of face-to-face communication. They are de facto part of the turn-taking exchanges between interlocutors: both speaker and listener simultaneously attend to what is being communicated both verbally and gesturally to one another. It follows that there is simultaneous mutual attending within dialogue which is irreducible to the sequencing of individual conversational turns, even though the latter are undeniably a key component of dialogue.

Finally, due to its focus on recordable observable behavior, the Jaffe–Feldstein model not only misses the levels of mutual attention and interest, but also the affective charge of the exchange—what it feels like for the interlocutors to engage with one another. This constraint imposed by the technological instruments used will have a profound impact on the theoretical model of dialogue provided by the researchers—it will favor an externally defined model of dialogical interaction in terms of patterns of recordable behavior which still needs to be supplemented by personal accounts of experienced affect.

Despite its confinement to recordable vocal behavior arranged in a sequence of individual turns, the Jaffe–Feldstein model offers clear advantages to dialogue researchers. Thanks to its logical rigor, it can be easily coded by a computer program and used in diverse experimental studies of vocalized dialogical relations. Consider the mechanics of this coding process. The experimenters begin by recording vocal behavior (sound or silence) of the interaction on two separate channels of a stereo tape recorder. The two audio signals provide input to a computer system called the Automatic Vocal Transition Analyzer (AVTA).26 AVTA performs an analogue to digital conversion of the audio input. It also listens or samples simultaneously every 250 ms to determine whether each of the participants in the interaction is vocalizing or silent (i.e. whether the signal in each channel is on or off). As previously noted, it is the temporal pattern of sound and silence rather than the semantic content recorded that is being studied.

The time series thus obtained is converted into a sequence of numbers which represent the four observable dyadic states of the dialogue: 0 = partners A and B are both silent, 1 = A is vocalizing while B is silent, 2 = B is vocalizing while A remains silent, and 3 = A and B vocalize simultaneously. It is possible to switch back and forth between this dyadic code and the two-channel time series, so the identities of the two interlocutors are preserved. AVTA program converts the 0–3 numbers into the five vocal states composing the turn-taking sequence of the dialogue (vocalization, pause, switching pause, and interruptive and non-interruptive simultaneous speech) and averages their duration per time unit, typically a minute of interaction. These data are revealing in that they show whether or not the partners have a mutual effect on one another over the course of the conversation. In fact, Jaffe and Feldstein27 documented a powerful phenomenon termed “vocal congruence,” in which interlocutors tend to match the time patterns of their speech. The major vocal states the researchers found to be congruent were pauses and switching pauses. The durations of vocalizations and turns were not generally found to match. The researchers did not examine whether simultaneous speech is matched since it occurs relatively rarely, especially in orderly or “polite” conversations among adults. The authors therefore deliberately excluded cases of a conversational clash, which is typically associated with affectively charged exchanges amongst angry partners or lovers.

There is general agreement among researchers that conversational congruence reflects the interlocutors’ respective susceptibility to mutual influence, or their respective ability to accommodate others in their own actions. Researchers note that such rhythmic adjustment to the other may be a key factor in our emotional response and evaluation of the partner: speakers who tend to mutually adjust their temporal speech patterns to match that of the other also tend to see each other as more attractive, warmer, and similar than those who do not; they are also found to be more interpersonally sensitive towards others.28 It may be that those perceptible “good vibrations” in the timing of our transactions with others help us to feel that we connect with them both emotionally and morally.

Conversational congruence thus seems to enable and shape empathic understanding. This finding is significant for the way we think about affectivity and intersubjectivity in an interdisciplinary context. For, in addition to supplementing an externally-determined model of dialogical interaction with the felt dimension of that interaction, phenomenological investigation of the phenomenon of affectivity would also help to draw phenomenology out its static cave and into genetic methods and issues like embodiment and interpersonal relatedness. If it is the case that conversational congruence is a developmental feature of human life, which depends on the practice of dialogue with responsive others from infancy and early childhood onward, then it suggests an expressly dynamic and social view of affectivity, which in turn demands a more genetic kind of investigation. On this latter reading, affective attunement would be irreducible to an innate, phylogenetically generated and/or ontogenetically fixed feature of human life. Rather it would appear to be generated (at least in part) and dynamically modulated by the kind of intersubjective interaction and synchronization that takes place in preverbal exchanges between mother and infant.

4 Dialogical competency in infants

The Jaffe–Feldstein adult dialogue model of temporal coordination has been successfully applied to the studies of interpersonal interactions in infancy. For example, Bakeman and Brown29 examined “behavioral dialogues” in mother–infant interaction in neonates. Numerous researchers focused on gaze and kinetic interactions at the age of four months.30 Vocal interactions have been studied at the age of four months31 and 9 months.32 In these studies, researchers recorded spontaneous interactions between infants and their mothers. The recordings were fed into the Automated Vocal Transaction Analyzer described above, and out of the sequences of sound and silence the five basic vocal states and their patterns of mutual coordination were derived. As Jasnow and Feldstein observe, the turn-taking pattern was clearly visible in the infant–mother exchanges: “the […] data make it plain that alternating vocalization was much more common than simultaneous vocalization and indicate that the vocal exchanges between these preverbal infants and their mothers partake of the same essential format that governs conversations between linguistically competent adults.”33 Together with the other existing data on reciprocal active and passive patterns in infant–mother interaction, these findings “lend support to a view of the mother–infant pair as a system that follows basic rules of dialogical exchange prior to the development of linguistic competence.”34

Another important finding of the vocal interaction studies is that the switchingpause durations between infants and their mothers are positively correlated. Jasnow documented that this correlation occurred due to mutual or bidirectional influence. The matching of switching pauses points to another similarity between dialogue in adulthood and infancy, since in both cases the switching pauses in vocal exchanges tend to match.35 This positive correlation shows that both participants tend to adopt a uniform manner in which they regulate the exchange of conversational turns. Importantly, the switching pause is a saliently interpersonal element of the exchange—it is a shared silence which regulates the pace of the back-and-forth movement of the conversation in its passage from one speaker to another. It is therefore telling that it is the switching pauses (rather than the pauses within individual utterances) that are matched via bidirectional influence within infant–mother pairs.

One might wonder whether the documented absence of simultaneous vocalization and the matching of switching pause duration necessarily testify to a bidirectional or mutual influence between mother and infant. Another feasible interpretation would be that they are one-sidedly facilitated by the adult participant, who relies on her symbolic conversational competence to alternate vocalization by consistently avoiding or withdrawing from simultaneous vocalization and synchronizing the switching pause. In that case, the absence of simultaneous vocalization and/or the consistency of switching pause duration in mother–infant exchanges would tell us more about the temporal structure that governs verbalized conversation between competent adult speakers, and how these rhythms are communicated to the preverbal infant. The empirical findings could then be interpreted as revealing a process by which the infant is enculturated by the adult—a kind of training for future language acquisition—rather than indicating any innate temporal structure intrinsically governing the actions of the infant.

It seems that additional research is needed to settle the question of which interpretation should gain more currency, and whether the “natural” and “cultural” interpretative narratives might in fact be reconciled. One way of settling the debate would be to study whether the structure of alternation and switching pause matching are apparent in infant–infant vocal exchanges, and if so, how they compare to adult–infant conversational interactions. Anecdotal evidence suggests that “baby talk” between young infants contains the elements of conversational congruence found in the research by Jasnow, Feldstein, and others, but more exact reports are needed still. If available, the evidence for infant–infant conversational congruence would strengthen the case for the primacy of dialogicality in human social relations. At the same time, the presence of an enculturating process in infant–adult vocal exchanges does not weaken the case for the primacy of dialogicality. It suggests rather that dialogicality ought not be read as an exclusively innate and genetically fixed dimension of human sociality, even though it may be phylo- and onto-genetically determined to a degree. Such determination does not preclude cultural influence, as has been noted by other developmental psychologists who study sociality in infancy and early childhood and who suggest that the adult constitutes the social scaffolding of the infant.36 Dialogicality may straddle the nature/culture divide and display both biologically and socio-culturally acquired features from infancy onward.37

Notwithstanding such similarities, the analogy between adult–infant vocal exchange and adult–adult conversation is not complete, as is indicated by research in developmental psychology. The pauses within utterances which show strong congruence in adult conversations do not match in mother–infant interactions. However, the differences are not limited to this specific finding. Needless to say, the infant’s vocalizations are not structured linguistically. Nor is the mother “speaking” to the infant in the same manner she would to an adult interlocutor. In contrast to adult-directed speech, her vocalizations tend to be shorter, pauses longer, and pitch higher when addressing an infant.38 Trevarthen39 noted a universal presence of ‘intuitive motherese’ characterized by an adagio beat in the mothers’ mode of address of their babies across cultures. These specificities of mother–infant interaction and the demonstrable differences between vocal congruence in infancy and adulthood do not however undermine the case for the primacy of dialogicality in human social relations as long as we view it along a continuous curve from birth to adulthood, without expectation of strict identity between the earliest and the subsequent developmental states. It would go against a genetically sensitive approach to expect a strict overlapping between the patterns found in adulthood and infancy. Nor is it necessary to locate identical structures in the early and later phases of ontogenesis to make a strong case for the consistently dialogical character of interpersonal transactions in the face-to-face mode. The argument for continuity between infancy and adulthood leaves room for both difference and similarity between them.

Few authors have studied the consequences of these early rhythmic patterns of coordination between infant and mother for later development. A notable exception to the rule is an extensive infancy study conducted recently by Jaffe, Beebe, Feldstein, Crown and Jasnow,40 which followed the Jaffe–Feldstein approach in the examination of vocal dialogues. The researchers looked into how the coordination of vocal rhythm at age 4 months predicts the trajectory of social and cognitive development at age 12 months. They found significant correlations between the degree of vocal matching in adult–infant pairs and the infants’ performance on standard tests evaluating attachment and cognition (respectively, Ainsworth Strange Situation and Bayley Scales) around their first birthday. They were thus able to show that the degree of coordinated interpersonal timing in infancy has direct repercussions for the patterns of social relatedness and cognition in later years. As Rochat41 notes, this is especially remarkable considering that the assessment of cognition involved tasks which were not obviously social, such as stacking blocks or looking for hidden objects.

The research of Jaffe and his colleagues is innovative in that they studied not only the infant’s rhythmic patterns in the interactions with the mother in the familiar home setting, but also looked into how a stranger and an unfamiliar setting of the lab affect the infant’s interpersonal timing. This research confirms the earlier findings that infants are fully-fledged participants in bidirectional turn-taking interactions with others. Notably, they reaffirmed the previous data showing that infants tend to match the switching pause duration just like the adult interlocutors tend to do. However, thanks to the inclusion of an unfamiliar social partner in the lab setting, the researchers were also able to document another similarity between vocal interactions in infancy and adulthood. Like adults, infants discriminate between social partners (e.g. mother, stranger) and are sensitive to the context of the interaction (e.g. home, laboratory). They exhibit a greater level of temporal coordination in their interactions with an unfamiliar person in an unfamiliar environment than with the mother at home. As the researchers note, such tighter conversational rhythms are also typical of adults conversing with a novel as opposed to a familiar partner.42 This similarity in the relative increase and decrease of social coordination depending on the degree of familiarity suggests that infants, like adults, may actively use the interactive exchange as a fertile ground for adjusting or affectively “attuning” themselves to the unfamiliar partner (in a way one would practice with an unfamiliar fellow musician), and for making predictions about her behavioral patterns (or, in the case of a musician, the playing style). A tightly knit vocal exchange suggests an increased level of vigilance and insecurity with regard to a novel interlocutor than in the case of a familiar partner such as the mother, where the interaction may be less constrictive and the participants feel greater ease in being together. Such loosely knit temporal coordination may leave more room for play, variation and creativity in interpersonal interaction, both in infancy and adulthood.

These points are important because they underline the insufficiency of a purely externally-determined model of dialogical interactions. Based on the research reviewed here, it appears that a high level of temporal coordination may signify both an emotional response to the partner perceived as attractive and warm, with whom one has a sense of empathic connection,43and the emotional response to the partner perceived as a relative stranger, who increases one’s sense of insecurity and vigilance, and where presumably the feeling of empathy would be relatively low.44 In the latter case, the tightly knit interaction seems to serve the purpose of establishing the felt interpersonal connectedness which is already operative in the former case. Yet since similar observable behavior may indicate the achievement of emotional attunement and an effort to establish such attunement (in its presumed absence), it is insufficient to use the behavioral record alone as indicator of what it feels like to be interacting with this particular partner in this particular situation. The perceptible good vibrations are therefore irreducible to the quantifiable interpersonal congruence, and need to be fleshed out in a phenomenological description of what the given interaction felt like. A methodologically more complex approach such as this, which combines the study of observable behavior with first-person reports, would offer a more comprehensive account of intersubjective relations in the face-to-face mode. It would also contribute to the development of a mutually constraining and enlightening interdisciplinary line of inquiry between phenomenology and cognitive science.

However, the plea to include phenomenological reports in empirical research is, in the case of infants, constrained by the subjects of inquiry themselves. For even though such a plea may be applicable to the investigations of older children and adults, it does not apply to the newborns. Phenomenologists consult empirical research in child psychology in part because their own descriptive methods are inapplicable to infancy and childhood. How can one expect the speechless infant (from the Latin in fari = non ± speak) to report what it feels like to play with mommy rather than an unfamiliar lab employee, and whether the familial playroom at home is more inviting to roam around in than the sterile hospital unit? The child psychologist’s limited focus on the externally observable aspects of interpersonal relations in infancy is not a sign of gross oversight. This limitation is intrinsic to empirical investigation of the earliest stages of human development. Thus, even though feeling good vibrations is indisputably an important aspect of dialogical relations, it is also one that, in part, eludes an interdisciplinary developmental approach.

Nevertheless, we might consider how personal reports by the infant’s significant others might enrich the overall account of interpersonal interaction. For instance, reports from the mother or another primary caregiver about what it feels like to interact with this particular infant in a given situation, about how the affective tenor of her participatory interactions with the infant compares with that of interactions that she observes in a non-participatory manner: these might shed light on the infant’s perceived affective state in different settings and with diverse partners. Affects are not, after all, enclosed in the sphere of inner private life but circulate between the self and the other in the first-to-second person exchange. Even though some of this emotionally expressive comportment, such as the patterns of gaze and smile, can be recorded and quantified, extra depth may be gained in the overall account of intersubjective relatedness in infancy, by introducing the experiential perspective of one of the participants regarding the experienced affective tenor of the range of interactions.

5 Towards verbalized dialogue

Despite the wealth of studies on dialogical relations in infancy, there are, to my knowledge, currently no studies specifically investigating the relation between the congruence in vocal (and other pre-symbolic) interactions, and the subsequent linguistic competence in the child. However, if it is valid to call the rhythmic turn-taking exchanges in infancy proto-conversations, one can also advance the claim that the proto-speaker and proto-addressee roles are taking shape in the infant’s face-to-face interactions with the caregiver. It is reasonable to assume that the bidirectional matching between the infant and the mother, which provides one of the first instances of the self taking the other into account, serves as a developmental foundation for the verbalized relation between interlocutors who take each other into account in speaking and listening to one another. If that hypothesis is correct, then the acquisition of the interrelated I and you pronouns, which mark first- and second-person reference in language, can be said to gradually emerge from the earliest interchangeable turn-taking roles occupied by the infant and the caregiver during the earliest interpersonal interactions. It may be that the roles of proto-speaker and proto-addressee that take shape in synchronized face-to-face interactions in infancy serve as necessary (albeit by no means sufficient) preconditions for acquiring interpersonal deixis. Since I-you pronouns are canonically deployed in a turn-taking face-to-face interaction, it is reasonable to assume that they have their roots in the turn-taking face-to-face interactions of infancy. It is also reasonable to assume that the earliest exchanges of vocal turns between the infant and the adult provide an excellent practice field for the future transition to verbalized turns and for the acquisition of linguistic personal markers of the co-participants in the turn exchange. In agreement with Christine Tanz,45 the alternation of turns in conversation provides the macrostructure, while the alternation of the I and you pronouns provides the microstructure of language. The alternating rhythm and role switching of proto-conversations in infancy appear therefore to provide the macrostructure for conversations in which the markers of speaker and addressee will eventually be embedded. The macrostructural level—together with its facial expressions, gaze patterns, and face-to-face orientation—survives from infancy to adulthood as much as it incorporates linguistic elements into the negotiation of the dialogical exchange.

The emergence of verbalized dialogue from the earliest face-to-face interactions ought not be understood in terms of successive stages, but rather as a single process. Consider the two ways in which this emergence could be understood: (1) infancy provides cognitive prerequisites of dialogue, which are necessary but not sufficient conditions of mastering linguistically-coded dialogical roles; (2) key components of dialogical relations in infancy (e.g. context-dependence, face-to-face orientation, turn-taking character of interactions, interchangeable active and receptive roles, non-identical perspectives) are preserved within verbalized adult interpersonal interactions. Both senses underscore a marked continuity between infancy and adulthood, but while the former provides a set of conditions that may have to be met for the child to acquire spoken language, the latter focuses on the gestural and temporal elements of interpersonal interactions that are found along the developmental curve and continue to typify face-to-face interactions in ontogenetically more advanced stages despite the fact that language has come on board. Following Gallagher’s lead,46 I believe that the second view has a greater explanatory force and is corroborated by both empirical evidence and phenomenological analysis of face-to-face interactions in infancy and adulthood. It suggests that the key features of Trevarthen’s primary intersubjectivity are preserved in later life and so are primary throughout direct interpersonal interactions.

My hypothesis about the emergence of first- and second-person linguistic reference from the dialogical roles that are assumed in the earliest forms of face-to-face communication finds support in larger claims that have been made about the relation between face-to-face interaction and gesture on the one hand, and language on the other. For example, Rochat47 draws attention to the uniqueness of human infant-caregiver engagement in extended face-to-face exchanges. Even though nonhuman primates engage in grooming and display affectionate care for one another, they do not seem to engage in reciprocal dialogical exchanges to the same extent that human primates tend to do across cultures. The psychologist concludes that these uniquely human face-to-face dialogues “may be a partial mechanism for the developmental emergence of uniquely human co-cognitive adaptations such as language and explicit thinking in the form of real as well as virtual dialogues.” To take that point further, face-to-face dialogues are the fertile ground in which the seeds of knowledge are cultivated. After all, it is unclear in what manner, other than a conversational exchange, a child could ever learn about the environing world; it is only by favoring an adult-centered bias that one can support the view of solitary study as the doorway to gaining knowledge. In Rochat’s view, however, the developmentally first conversational model of learning about the world gives way to internalized dialogues, as a way thinking about the world. Thinking, according to this view, is debating mentally with oneself. Following the lead of Lev Vygotsky48 who argued that all higher mental functions are internalized forms of social interaction, and that they retain this social interactive character even when they are performed in solitude, Rochat calls for further empirical research that would capture the dialogical and social nature of cognition as it unfolds in development. For it is only by following the developmental trajectory from the external to the internalized (but still social) mind that we may overcome the individualist bias pervasive in some brands of psychology, and also in the classical ego-centered phenomenology. An expressly genetic line of inquiry thus helps to root not only affectivity but also thinking in the social context of dialogical interaction and synchronization with others, which is manifest already in infancy and early childhood.

My hypothesis about the emergence of linguistic personal markers from dialogical roles practiced in the earliest forms of pre-linguistic face-to-face communication finds further support in larger theoretical claims about the relation between gesture and language. Jaffe and Anderson49 put forward a brief proposal for a gestural communicative hypothesis of speech origin, according to which the evolution of language derives from the interpersonal matching of temporal patterns of gesture. In the authors’ words: “Our view of language origins is in terms of social-emotional communication, conveyed by paralinguistic properties of dialogue, with their major locus in pre-linguistic mother-infant interaction.” Vocal congruence discussed in this chapter, as well as intersubjectively coordinated patterns of gaze and movement 50 provide specific instances of socio-emotional communication in infancy out of which verbalized communication emerges. The authors point notably to the similarities in the time patterns of gestures and the temporal course of vocalizations and pauses that occur during speech. The shared rhythm of the articulatory movements used in speech and the bodily movements in gesture suggests that speech is an evolved type of communicative bodily gesture. On this hypothesis, speaking is primarily an activity carried out with our expressive and communicative bodies, and the movements of speech-making are correlated with the ones involved in gesturing. This hypothesis originating in empirical research resonates and is further corroborated by phenomenological contributions, notably by Merleau-Ponty’s gestural conception of language developed in the Phenomenology of Perception.

According to Merleau-Ponty, spoken language is to be understood as a phonetic gesture,51 for we vociferate with our bodies, the windpipe, the vocal cords. Now, if speech is like gesture, it follows, contra Saussure, that the words we speak do not have a purely conventional and arbitrary connection with their referents, but that the connection is to a degree natural. Like gestures, words are laden with an affective content,52 which is all important in poetry. For example, by using tropes such as alliteration, as in “on scrolls of silver snowy sentences” (Hart Crane), the poet lends a musical air and an emotive aura to the sentence. Language is therefore not a neutral conveyor of propositional content but primarily a way of ‘singing the world’—it has, inescapably, certain melodic and rhythmic components, a certain beat (like the adagio beat of the motherese), which carries emotionally rich content. Verbalized language conveys the affectively charged vibrations that child psychologists located already within the rhythmically patterned proto-conversational exchanges between the infant and her caregiver. If language is gestural, as Merleau-Ponty and others argue, then it is laden with affective meaning that is immanent and immediately apparent, without the need to gradually infer it. The mother’s expressive face speaks to the infant just as her words will “secrete” their own meaning to a linguistically competent child. This inherent expressivity of the communicative body may explain why the acquisition of language does not displace nonverbal gesture in development, and why gesture continues to convey affective meaning once language has come on board. The host of nonverbal communicative and rhythmically patterned gestures operative from the first moments of human life appears simultaneously as a pre-linguistic condition and an irreducible paralinguistic component of verbalized dialogue, and thus as integral to intersubjective relatedness.53

The inherent expressivity of the communicative body is intimately linked with the inherent temporality of the body, to which the rhythmically-structured dialogicality of infant–mother interaction testifies. The affective attunement between infant and other, which we earlier described using metaphors of musicians modulating with their fellow players, might also carry over, developmentally, to the melodic and rhythmic experiential aspects of poetic language. A developmental link of this kind—between the temporally- and affectively-patterned character of infant dialogical activity and that of adult musical and poetic expression—would seem to support the argument that personal reference in language developmentally emerges from bodily communicative roles in preverbal dialogue. In any case, such investigations highlight the way that the themes of affectivity and intersubjectivity illuminate a host of human activities that may prima facie seem unrelated.


