Introduction

Recently, Collins et al. (2006) have demonstrated that so-called interactional expertise developed through linguistic interaction without full-scale practical immersion in a culture, does in fact exist (Collins 2004; Collins and Evans 2007). In their ability to “talk the talk” of a field, interactional experts are indistinguishable from so-called contributory experts, identified by their full-scale practical immersion in the specialist area. Thus, bodily experiences, i.e. first-person experiences of relevance to the domainFootnote 1, seem inconsequential to how one talks about a domain.

Results obtained by the so-called imitation game devised by Collins et al. (2006) confirm the idea of interactional expertise. Here, a judge (a contributory expert within a particular field) posed written questions to two (to him unknown) respondents, one contributory and one interactional. In the “field” of colour vision, Collins et al. showed that colour-blind people are capable of deceiving judges with normal-colour vision. Despite their (minor) handicap, colour-blind people are linguistically immersed in “colour vision language”, spoken by the majority of people, and they are, therefore, thoroughly acquainted with the language.

Why is this surprising? To fully grasp the bewilderment, consider that colour-blind people (e.g. dichromats) match any perceived colour with a mixture of only two spectral lights and therefore, might experience difficulties in distinguishing classes of items that differ only in colours. For example, they may find it difficult to distinguish a ripe from an unripe tomato. Some might also confuse the red and yellow of a traffic light without other clues (e.g. shape or location). Thus, partial colour-blindness is likely to have, however few and mild, behavioural consequences. According to the study by Collins et al. (2006), those consequences seem to be non-verbal altogether.

Neurophysiological results that seem to relate linguistic skills to bodily experiences intensify the surprise. For instance, when passively reading words with strong olfactory associations, such as cinnamon or garlic, primary olfactory cortices normally involved in perceptual processing are recruited (González et al. 2006). Thus, mere reading of words recruits neuronal areas which are normally correlated to the actual experience of a smell (for studies on the metaphorical use of action words suggesting that use of action words in a figurative sense is not effective see Aziz-Zadeh et al. 2006; see also Raposo et al. 2009 for discussions on semantic context). Neurons activated as a result of first-person experiences of the referent of a concept (i.e. garlic) are also involved in neural correlates of the concept without simultaneous presentation of the actual object. According to Pulvermüller (2005)—given that frequently co-activated neurons strengthen their mutual connections—specific cortical links develop whenever experiences (for example the smell of garlic) correlate with specific language processes (the word “garlic”). (For theories linking words and senso-somatic processing see Barsalou 2008; Meteyard and Vigliocco 2008; for causal links between the motor system and the comprehension of language see Glenberg et al. 2009).

Therefore, it seems plausible that “first-person”-related neural activations would be relevant with respect to the subjects' verbal output, at least when subjects address concepts that refer to tangible objects. Since the phenomenon of expertise is partly connected to hands-on experiences, the neuropsychological insight seems striking (Schilhab and Gerlach 2008a, b).

However, if we turn to results from the cognitive sciences, it is worth noticing that results on verbal characteristics originate from comparatively simple and discrete tasks, not advanced real-life verbal exchanges (see e.g. Holt and Beilock 2006). The linguistic complexity of a dialogue, for example, in the imitation game, which attempt to reproduce everyday dialogues, may conceal irregularities in the use of language and blur the identity of the contributory expert. This could explain why colour-blind people, despite differing significantly from people with normal vision in their colour experiences, passed as interactional experts in the imitation game in Collins et al. (2006). If imitation games were directed more explicitly at descriptions of experiences in the first-person sense and elaborated on sensations (presumably missing in the interactional expert), would their conclusions align with, or differ from, previous results?

To challenge the non-embodiment idea of interactional expertise and to explore verbal representations of bodily experiences in conversations, we conducted imitation games that specifically tap into knowledge of bodily experiencesFootnote 2. The imitation game method has immediate potential to empirically identify linguistic differences and frame critical questions of what is meant by language being embodied. In this paper, we report on imitation games that to some extent, follow the procedures described in Collins et al. (2006) obtained with female midwives who either had or had not given birth themselves.

Procedure

The midwife profession targets expecting mothers during the pregnancy, delivery and the post-pregnancy phases. Besides general health care issues, midwife practices concern bodily experiences connected to pregnancies and motherhood. Thus, the community is likely to employ a professional “jargon” concerned with bodily sensations related to, in this instance, pregnancy, delivery and breastfeeding. This specialist jargon is shared by all midwives, regardless of their personal experiences with pregnancy, delivery and breastfeeding.

In the original setup in Collins et al. (2006), a conversation between a judge and two respondents was categorised as “chance” or “identify”. In the “identify” condition, one of the respondents is ignorant about the jargon of the target field, whereas the other respondent is a contributory expert. Here, the judge is able to spot the deceiver. However, in the chance condition, the respondents are either contributory or interactional experts and therefore know the target jargon.

In the present paper, the jargon of interest is the jargon of motherhood. The research question is whether a non-mother midwife (hereafter called the interactional expert with respect to motherhood) would pass as a mother in questions on pregnancy, delivery and post-pregnancy matters. Following the terminology of Collins et al., our experiments belong to the chance condition, since we assume that both respondents are knowledgeable of the target jargon. If so, judges should be unable to tell apart the mother midwife from the non-mother midwife.

Method

Experimental condition

The experiment consisted of two phases. In phase one, real-time experiments at the university involved computer-based conversations between three participants. In phase two, complete real-time conversations were transcribed and sent to new judges by mail or email (see Fig. 1). Their judgements were statistically treated in the same way as judgements obtained in phase one.

Fig. 1
figure 1

The experiment consisted of two phases. In phase one, real-time experiments at the university involved computer-based conversations between three participants. In phase two, complete real-time conversations were transcribed and sent to new judges by mail or e-mail

Phase one

Seven conversations were conducted in which a mother midwife posed questions to a non-mother midwife and a mother midwife. Participants were recruited among employees at public hospitals all over the country. This was done deliberately to reduce the risk of regional differences to appear as a confounding variable. The average age of interactional experts was 39.3 years (varying from 29 to 52 years) and 43.3 years for contributory experts (varying from 29 to 55 years). The average number of years at practising midwifery varied between 2 and 25 years with an average of 12 years for interactional experts and between 2 and 23 years with an average of 10.5 years for contributory experts.

Subjects were seated in three different rooms and kept anonymous at all times. Judges were explicitly instructed to ask questions that identify bodily sensations of giving birth, pregnancy and motherhood. To get a first approximation of productive questions, they were shown written examples from the “field” of driving. Since the midwife community in Denmark is fairly small, the judge was told to avoid questions about identity. Moreover, she was instructed to ask questions that might help her to distinguish the mother from the non-mother midwife. After each question, the judge reported her answer—if possible—on a scale from 1 to 4:

  1. 1.

    “I have no idea of who is who”,

  2. 2.

    “I am more in doubt than sure”,

  3. 3.

    “I am more sure than in doubt”,

  4. 4.

    “I am pretty sure of who is who.”

When the judge had either decided who were what, or were incapable of making judgments, the experiment stopped. The interactional expert (non-mother midwife) was instructed to pretend to be a mother midwife, while the contributory expert was told to answer the questions as truthfully as possible. An interview lasted typically between 45 and 75 min consisting of between six and ten questions. No subject was informed about the details of her role in the experiment until the debriefing, and all were paid for their participation.

Control condition

To examine whether midwives actually possess a language of pregnancy and thus a putative ability to deceive in matters about pregnancies even without personal experiences, we conducted four additional experiments within the chance condition in which the judge, now a lay mother, asked questions to a mother and a non-mother midwife. It is difficult to predict the random answer distributions; therefore, we chose distribution of answers by lay mothers (a mother who is not a trained midwife). Even if the category of lay mothers appears to be heterogenic with respect to educational background and social status, all lay mothers are knowledgeable of motherhood. So, if judgments by lay mothers do not differ from those of midwives, midwifery does not improve the judges' ability to distinguish or the non-mother midwives' ability to deceive. Hence, we could actually use lay mother-conducted conversations to test for the existence of a professional jargon in midwifery. Lay mothers were recruited among students and in the local community by advertisement. The average age of interactional experts was here 46.3 years (varying from 32 to 53 years) and 41.5 years for contributory experts, varying from 34 to 51 years. The average number of years as practising midwife varied between 5 and 25 years with an average of 16 years for interactional experts and between 8 and 18 years with an average of 11.3 years for the contributory experts.

Phase two

In phase two, the 11 conversations from the experimental and control conditions in phase one were transcribed into a standard format to eradicate any unintentional distinguishing marks. The transcripts were mailed to ten mother midwives and ten lay mothers to counteract any effects of superficial characteristics, such as long versus short replies, number of answers in the conversations etc. (see Fig. 1). Phase two judges were recruited all over the country to avoid bias of the results as a consequence of regional differences. The transcripts were distributed in random order from judge to judge to avoid any learning bias across sessions.

Equipment

Conversations were performed using either the free Microsoft messenger system or the system used in Collins and EvansFootnote 3 (in print).

Data

The quantitative results presented here are all obtained from the judgments in the experimental and control conditions in phases one and two. Following Collins et al. (2006), all answers were computed according to the following rules: level 1–2 were treated as “uncertain”—even if the answer was correct. Level 3–4 were treated as “wrong” if false, and “right” if true.

Results

Phase one

To examine whether non-mother midwives can fool mother midwives, we compared the results from the experimental condition with those from the control condition by Fisher's exact test (see also Collins et al. 2006). To use the Fisher's exact test, a four-cell table is constructed. The responses have been dichotomized to “correct” versus “non-correct”, thereby collapsing other response categories into one category. Consequently, wrong and uncertain guesses are treated as one category. The test showed that the distribution of judgments differed significantly between the two groups (see Fig. 2, p < 0.05) indicating that midwives are significantly better to distinguish between mother midwives and non-mother midwives than lay mothers.

Fig. 2
figure 2

In phase 1, in the experimental condition, seven conversations were conducted in which a mother midwife posed questions to a non-mother midwife and a mother midwife. The control condition consisted of four conversations in which a lay mother posed questions to a non-mother midwife and a mother midwife. Wrong and uncertain guesses are treated as one category to be compared with right guesses. The test showed that the distribution of judgments differed significantly between the two groups (p < 0.05)

Phase two

Though significant, the number of conversations in phase one is low, which increases the influence of outlier phenomena. For example, we could be faced with four extraordinary control conversations which were not representative of how conversations would normally play out or of the normal outcome of judgments. Both to compensate for the number of control conversations and to examine whether midwives were better judges than lay mothers overall, thereby eliminating the uniqueness effect of individual conversations, in phase two, we compared the response patterns between midwives and lay mothers considering various restrictions on data. First, the fact that one lay mother applied the score “uncertain” through all 11 cases raises the question of how to handle such response patterns in the analyses. Second, the dichotomization of responses to “correct” versus “non-correct” collapsing alternative response categories, may hide important discrepancies between the groups. Consequently, a varying number of response categories and inclusion or exclusion of one specific lay mother will appear in the statistical analyses below.

To statistically validate the totals from each of the groups, presented in Fig. 3, a test for homogeneity across individuals within each group is carried out. For each individual, the responses across 11 cases (i.e. conversations) are assumed to be independent, and a multinomial distribution applies to the number of certain, uncertain and wrong judgments. Simple chi-square tests for comparing the ten multinomial distributions on three response categories results in p < 0.01 (df = 18, lay mothers) and p > 0.05 (df = 18, midwife mothers). One individual (with all responses uncertain) is contributing much to the significant chi-square for the lay mothers. If this individual is removed from calculations, the chi-square test for homogeneity is not significant, with p > 0.05. Under the dichotomized regime using only two response categories, correct vs. non-correct, the chi-squares result in p > 0.05 for both groups.

Fig. 3
figure 3

Here, the Y-axis shows percent correct, and the X-axis depicts lay mothers and midwives, respectively. The bottom line of the box indicates 25% fractile, while the top line indicates 75% fractile of correct answers. The vertical bars indicate 10% lowest and 10% highest values. A reference line 2/9 = 0.222 is indicating the level of correct if it was a result of guessing

Comparisons between the two groups, midwife mothers and lay mothers, can therefore, in case of the dichotomized response regime correct versus non-correct, be undertaken simply by means of comparing the totals. In case of analysing group differences using the three-response categories, it is necessary to consider the inclusion of the individual with all responses scored uncertain.

However, before this step, a statistical analysis of the response patterns seen from the point of view of guessing should be carried out. Are midwife mothers and lay mothers creating a response pattern which differs from simple guessing? Implementing a hypothesis of guessing into the individual multinomial distributionsFootnote 4, ordinary goodness-of-fit chi-squares can be applied to each individual and summed up across individuals. These tests clearly reject the hypothesis of guessing, both in case of two- or three-response categories and with or without the specific lay mother using category uncertain for all responses.

Figure 3 displays the distribution of percent correct judgments for all individuals in the two groups by means of a simple BOX plot. A reference line 2/9 = 0.222 is indicating the level of correct if it was a result of guessing. By visual impression, the graph indicates systematically higher values in the case of midwife mothers as compared with lay mothers. This is supported by a non-parametric Wilcoxon test showing significantly different levels for the two groups.

The summarising analyses between the two groups have been carried out, considering (1) two-or three-response categories and (2) with inclusion or exclusion of the lay mother with all responses given as uncertain. The test used is ordinary chi-squares from contingency tablesFootnote 5. In brief, all tests show that there is a significant difference between midwife mothers and lay mothers, p < 0.05. The interpretation is that midwife mothers have higher frequencies of correct judgments and fewer wrong and uncertain responses compared with lay mothers.

Figure 4 displays one set of analyses totals using two-response categories and all respondents.

Fig. 4
figure 4

Totals for the two groups in phase 2 summing up the responses across 11 cases in three response categories “right”, “wrong” and “uncertain” using Fisher's exact test

When the specific lay mother giving 11 uncertain responses is removed, the 48 uncertain responses in column 2 drop to 37.

Discussion

In view of the theory of interactional expertise, imitation games where mother midwives interview mother midwives and non-mother midwives about pregnancy, delivery and breastfeeding belong to the chance condition. Both categories of midwives are genuine practitioners and therefore contributory experts capable of conversing about the field (Collins et al. 2006). Only personal experiences with pregnancy, delivery and breastfeeding distinguish mother midwives from non-mother midwives. If these personal experiences do not add significantly to how one refers to them as conjectured by the theory of interactional expertise, we would expect mother midwives and non-mother midwives to express themselves similarly in these matters. Consequently, we would expect the distribution of judgments by mother midwives to be equal to those of lay mothers who has knowledge of motherhood. Both in phase one, in which the judge herself was responsible for the questions asked, and in phase two in which the judge was exposed to ready-made material, the distribution of judgments in the experimental and control conditions differed significantly. When considering the proportion correct for each of the midwife mothers as compared with each of the lay mothers, midwife mothers were both greater than in the control condition and greater than chance.

If bodily experiences are of no significance for how one talks about a domain, mother midwives should be unable to distinguish between mother midwives and non-mother midwives. The findings presented here suggest that mother midwives can see through the professional language and identify the mother midwife. Whereas lay mothers in the control condition apparently were deceived by the linguistic aptitude of non-mother midwives.

A number of possible objections could be raised. First, how can we be sure that the professional language of midwives resembles a jargon of motherhood of some kind? If the professional language shared by midwives does not apply to the talk of pregnancy, delivery and breastfeeding, non-mother midwives could not possibly be acquainted with the jargon and would subsequently be exposed in a test. However, even if the overlap between motherhood jargon and the professional jargon of midwifery is not complete, the control condition seems to suggest that non-mother midwives are competent motherhood language users, at least enough to deceive lay mothers. And since lay mothers, more than anyone, are assumed to possess motherhood language, the fact that non-mother midwives can pass as mother midwives is intriguing. How come that non-mother midwives can conceal their identity before lay mothers but not before midwife mothers? Obviously, midwives are professionals. They meet hundreds of pregnant women a year and witness the same amounts of deliveries. Due to their profession, it is likely that their register of pregnancies and deliveriesFootnote 6 is colossal. This optimises the performance of mother midwives as judges. On the other hand, it also improves the ability of the non-mother midwife to give reasoned answers (see example of conversation, Fig. 5).

Fig. 5
figure 5

Can you tell the difference? The excerpt is from a conversation that consists of eight questions and answers between a mother midwife and a non-mother (b) and a mother midwife (a)

While it seems likely that lay mothers are misled by the professional jargon, then, mastered by both the mother and non mother midwife, mother midwives grab features embedded in the answers. The question is what characterises these clues?

One, perhaps trivial, difference between first-person experiences and second-hand experiences (i.e. observation of another person giving birth) has to do with emotions. Obviously, there is a world of difference between giving birth and observing someone giving birth. On the unpleasant side, delivery is painful and distressing, and on the pleasant side, along with the baby comes an irresistible beginning in the most radical sense. To most women, the act of delivery is an outstanding experience. The issue here is not to dispute the uniqueness of such experiences, but to meticulously examine whether such personal experiences actually add substantially to how one talks about these experiences. Do non-mother midwives who lack personal experiences of delivery refer differently to the putative experience to an extent noticeable in language? Perhaps the emotional taint of the personal memory unmasks the mother midwife's responses because she is likely to focus on specific details not readily available to the non-mother midwife and therefore not present in the vocabulary? This hypothesis is by no means far-fetched, since effects of emotions on language processing are well-documented (Glenberg et al. 2005). In the current study, such effects might even come in double. According to Glenberg et al. (2005), “The full understanding of language about emotional states requires that those emotional states be simulated, or partially induced, using the same neural and bodily mechanisms recruited during emotional experiences. That is, language about emotions is grounded in emotional states of the body, and simulating those states is a prerequisite for full understanding of the language.”

One objection that makes this interpretation less tenable is the human ability to empathise with peers. Current theories on mirror neurons suggest that imitation goes beyond mere copying of physical acts, such as yawning or smiling. We can even imitate gustatory emotions (Jabbi et al. 2007) and apparently induce feelings that go with specific facial expressions (Lee et al. 2006). Mirror neurons might even be involved in processing abstract language (Gallese and Lakoff 2005; Glenberg et al. 2008).

Moreover, the practice of midwifery (at least in Denmark)Footnote 7 recommends modest use of invasive treatments such as sectios and rather endorses extensive empathy and caring on behalf of the practitioners. So, midwifery seems to be a profession that, among other things, encourages the ability to put one-self in place of the expecting mother to console and soothe herFootnote 8 (Schilhab 2007). We do not need theories of mirror neurons to get arguments of empathy off the ground. The power of imagination (which might be partly informed also by mirror neurons, of course) might be all we need to transcend other people's minds. In connection to the current study, an experienced non-mother midwife reported about a dream she once had about herself giving birth. The quality of the dream made her momentarily confuse dream and reality after waking up. Such imaginative powers demonstrate the existence of extensive abilities to put one-self in the place of others. According to Selinger (2003), such results can be obtained because of the effectiveness of extrapolation on behalf of the body (for discussion, see Selinger et al. 2007). See also Ribeiro (2007) for discussions on the influence of mere watching peers in relation to interactional expertise.

Another issue to consider is to what degree the results are due to abilities of the mother midwife as a judge and to what degree to her answers given? In other words, is the effect a result on the part of the judge or the respondent? Do mother midwives excel in judging because they have particular “inside information” as a result of sensory–somatic experiences? Or would experiments obtained with non-mother midwives judging the experimental condition produce the same distribution? Since we meticulously followed the guidelines of the original imitation game, we did not use non-mother midwives as judges. In case of non-mother midwives providing similar distribution of their evaluations, it is the profession of midwifery that seems to be responsible for the obtained results. If so, we would still be left with the result that a non-mother midwife cannot pass as a mother in questions on pregnancy, delivery and post-pregnancy matters. Thus, verbal representations of bodily experiences are transmitted to the written output and sustain the theory of grounded cognition even within real-life verbal exchanges.

Concluding remarks

In the theory of interactional expertise, linguistic competence depends on social immersion. The current study does not substantiate that claim. Rather, it suggests that somehow personal experiences make a linguistic difference, noticeable to contributory experts. The current experiments took for granted that the demarcating level is at the level of interviewing. Perhaps, carefully scrutiny of statements in which respondents pretend to be mothers as compared with honest statements could give us a better idea of the underlying dynamics.