Introduction

Research on the dynamics of social interaction and its assessment have been dominated by studies with a key role for social cognition, focusing on cognitive processes within one individual (Blakemore & Choudhury, 2006; Brizio, Gabbatore, Tirassa, & Bosco, 2015; Hutto, Herschbach, & Southgate, 2011). These entail for example the ability to understand others’ mental states, or “mentalizing”, studied in both normal development (Bosco, Gabbatore, & Tirassa, 2014) and psychopathology (Penn, Sanna, & Roberts, 2008). However, findings from laboratory research on social cognition have only partly been able to explain social functioning (Fett et al., 2011; Simons, Bartels-Velthuis, & Pijnenborg, 2016), which questions the assumption that social cognition is a prerequisite for social functioning (Hermans et al., 2019; Schneider, Myin, & Myin-Germeys, 2019). Although many other factors can be assessed to increase the explained variance in social functioning (Barch, Pagliaccio, & Luking, 2016), it is argued that crucial information continues to be lacking if these factors only relate to one individual, i.e. the observer. As an alternative to the observer’s point of view, the interactor’s point of view reflects the ongoing interaction and constantly changing dynamics in the environment (Schilbach et al., 2013). Therefore, paradigms studying social interaction using the mutual engagement of at least two parties have been gaining increased attention (De Jaegher & Di Paolo, 2008; Froese, 2018). In the assessment of real-time interaction, both interactors engage at the same time, allowing the dynamics of social interaction itself to be captured. This goes beyond the study of social cognition as internal mechanism, and instead focuses on basic capacities that constitute the dynamics of constantly changing behavior in interaction with body and environment.

From a very early age, infants show engagement with others and responsiveness to social cues (Reddy, 2010). The development of real-time responding to social cues and maintaining a social interaction has been studied in two-month-old babies, showing that infants could distinguish live from prerecorded interactions with their mother (Murray & Trevarthen, 1985; Nadel, Carchon, Kervella, Marcelli, & Reserbat-Plantey, 1999). The study of real-time successful interactive capacities has been continued in adults with the development of the Perceptual Crossing Experiment (PCE) (Auvray, Lenay, & Stewart, 2009; Auvray & Rohde, 2012). Within this experimental paradigm, interaction is defined as the co-regulated coupling between two autonomous individuals within their environment (De Jaegher, Di Paolo, & Gallagher, 2010). This is captured by the assessment of social contingency detection, which is defined as the sensitivity to other people’s responsiveness to one’s presence and behavior. Previous research has investigated social contingency detection in terms of objective measures of the interaction process, such as detection accuracy and turn-taking, and in terms of the participant’s subjective experience of interaction (Froese, Iizuka, & Ikegami, 2014; Zapata-Fonseca, Dotov, Fossion, & Froese, 2016). Detection accuracy entailed correct detection of the other based on real-time interaction with tactile feedback only. Turn-taking reflected a strategy that participants employed in order to detect the other. Subjective experience of interaction was measured with self-report, assessing the experience of the interaction without being given feedback on detection accuracy.

The PCE has, to date, only been implemented in adults. These studies showed evidence for accurate mutual awareness through sensorimotor coordination in a minimal virtual environment (Auvray & Rohde, 2012; Deschamps, Lenay, Rovira, Le Bihan, & Aubert, 2016; Froese et al., 2014; Kojima, Froese, Oka, Iizuka, & Ikegami, 2017; Zapata-Fonseca et al., 2016). Positive associations have been found between the mutual interaction process and subjective experience thereof, suggesting that co-regulation of interaction in real time is necessary for successful detection of social contingency (Froese et al., 2014). Some authors have even argued on the basis of the PCE that social interaction may constitute social cognition (De Jaegher et al., 2010; Froese & Di Paolo, 2011). More recently, the PCE has also been shown to distinguish interaction patterns, measured as the amount and variability of movement towards each other, in individuals with high-functioning autism from those in controls (Zapata-Fonseca et al., 2019; Zapata-Fonseca, Froese, Schilbach, Vogeley, & Timmermans, 2018). These studies, showing the ability to capture variability in capacities for social contingency detection in populations with impairments in social functioning, provided further evidence for the capacity of the PCE to investigate mechanisms of social interaction.

Although the development of social capacities starts from birth, the role of interpersonal functioning is very important in adolescence, during which maturation of social skills takes place (Smetana, 2010; Smetana, Campione-Barr, & Metzger, 2006). This is also the age period in which disorders marked with changes in social behavior have their onset, such as depression, anxiety, and psychosis (Kelleher et al., 2012; Schilbach, 2016). Early detection of variability in social interaction styles associated with social impairments and their development could enable timely prevention and intervention efforts. In adolescent populations and prospective longitudinal designs, the rather extensive PCE previously used in adults – fifteen rounds following a training phase, lasting up to one hour (e.g. Froese et al., 2014; Zapata-Fonseca et al., 2018) – may benefit from a considerably shortened design. We therefore adapted and shortened the PCE to six rounds, and compared it with the original fifteen-round design with regard to the ability to assess the learning of social contingency detection. In addition, we conducted a ten-round version of the PCE in a subsample of adolescents in order to test whether social contingency detection improves or plateaus after six rounds.

Research questions and hypotheses

The aim of this study is to assess the capacity for social contingency detection using a modified, shortened version of the Perceptual Crossing Experiment (PCE) in adolescents. Accordingly, the main research questions are: Does the modified six-round version of the PCE assess (1) overall social contingency detection measured as the amount of time spent together, correct detection of the other, and subjective experience of interaction in all rounds, and (2) learning of social contingency detection measured as an increase in levels of social contingency detection across six rounds; and 3) in a ten-round version of the PCE, does the average level of social contingency detection change in rounds seven to ten compared with rounds one to six? Specific hypotheses are detailed in supplementary material A.

Methods

Sample

Data collection took place between February 2018 and May 2019. Participants were recruited from the general population in Flanders, Belgium. This was done in secondary schools that participated in a large longitudinal cohort study on adolescent mental health and development; the SIGMA project (Kirtley et al., in preparation). Participation in the SIGMA project included 100 minutes of completing questionnaires in groups of 20 to 24 adolescents, from which eight were randomly selected to perform the PCE. The total number of participants was 148, of whom 116 completed six rounds and 32 completed ten rounds of the PCE (see procedure). Participants received a €10 voucher after full participation in the SIGMA study. The exclusion criterion was an inadequate level of Dutch or English and therefore failure to understand the instructions. Ethical approval was provided for the entire SIGMA study protocol including the PCE (S6 1395). This study was registered prior to conducting the analyses, after the data collection was finished, on the website of the Open Science Framework (https://osf.io/jmbdr/?view_only=9206a27ca3834a7da8da116b6154d1ad). The preregistration adheres to the disclosure requirements of this institutional registry.

Experimental setup

Participants played a game together with a randomly assigned partner. They were instructed to imagine walking through a long, dark loop corridor with this partner. This is a virtual space not visible to the participants, in which they could move an avatar on one axis back and forth with their dominant hand, using a trackball. Their task was to find their partner’s (i.e. the other) avatar in this space without communicating in any other way outside the interface. The setup, in which participants sat back to back and listened to Brownian noise via a headphone, prevented interactions outside the virtual space. Every time the participants’ avatar overlapped with another entity in the virtual space, they received tactile feedback (i.e. a vibration) on the hand moving the trackball. Within the virtual space, participants could encounter not only the other avatar, but also a “chair” and one “other moving entity”, which would each give exactly the same tactile feedback during an encounter. The other avatar is an animate, reactive entity. The “chair” is an inanimate, non-reactive entity to be referred to as the static object. The “other moving entity” is an animate, non-reactive entity moving exactly as the other avatar but at a fixed distance of 150 pixels, to be referred to as the shadow. Each avatar could sense only its own static object, not the other’s static object, which was positioned in a different location in virtual space (Fig. 1).

Fig. 1
figure 1

The virtual one-dimensional space with connected endpoints and the relation between entities; avatar A and B representing player A and B, respectively; shadow A and B representing the shadow of avatar A and B, respectively; and each player’s static object

The experimental setup was based on a previous study using the PCE in adults (Froese et al., 2014). The shared virtual space with connected endpoints consisted of 600 pixels, and the entities (static object, shadow, and other avatar) were 4 pixels wide (Fig. 1). On the virtual line, the two static objects (one for each avatar) were fixed at 150 and 450 pixels. The distance between the avatar and the shadow was 150 pixels. The tactile feedback indicating an encounter involved vibration with a fixed intensity during the crossing of another entity, making the duration of the vibration dependent on how much time the avatars’ pixels overlapped with this entity. Without overlap, the vibration was off.

Procedure

The total duration of the experiment was fifteen to twenty minutes. The first five minutes were used for instruction (see supplementary material B). The experiment consisted of six or ten one-minute rounds in which participants tried to complete the task of finding each other in the virtual space, using tactile feedback only. They were instructed to press a button (i.e. click) with their free hand at the moment that they were most confident of crossing the other avatar. Participants could use the entire minute to explore the virtual space and could also choose not to click in case they did not find the other avatar. They were instructed to stay in the other avatar’s proximity after they clicked in order to help the other to complete the task in this cooperative game. Each new round started with random starting positions for both avatars. In order to find each other, participants were expected to distinguish between inanimate and animate entities (static object vs. shadow and other avatar), and between non-reactive and reactive entities (static object and shadow vs. other avatar). This was not specified in the instruction to the participants. Each round was followed by three self-report items on a tablet about the participants’ subjective experience of interaction during that previous round.

Participants did not receive any feedback on the behavior or clicking of their partner or on their own performance during the experiment. They were debriefed about the purpose of the experiment, and the strategy they could have used, after they finished the experiment and only if they were interested to know about it. Figure 2 shows an illustrative round.

Fig. 2
figure 2

Recording of an illustrative round in which a pair (blue player A and red player B) interacted across the virtual one-dimensional space (y-axis) over the 60 seconds (x-axis), represented for each player separately. Solid bold blue and red lines represent the positions of the two players’ avatars. Dotted blue and red lines represent the positions of their shadows, illustrated in the upper panel in red and in the lower panel in blue. Solid light blue and red lines represent the location of each player’s static object. The vertical blue and red lines show the clicks. Blue player A clicked correctly (click assigned to the other avatar) as shown in the upper panel, while red player B clicked incorrectly (click assigned to the static object), as shown in the lower panel. Both players spent time exploring their respective static object. The red player remained with the static object, while the blue player started interacting with the red player after approximately 30 seconds

Measures

Amount of time spent together

For each entity in the virtual space (i.e. other avatar, shadow, static object), the amount of time spent with this entity was computed as the total time (in steps of 10 milliseconds) during which the distance between the entity and the participant’s avatar was below a given threshold. We set the threshold value at 70 pixels (see Froese et al., 2014). The amount of time spent together with the other avatar was defined as the time (in milliseconds) that the distance between the two participants’ avatars was below 70 pixels (hereafter referred to as time spent together).

Correct detection

Both participants within a pair clicked independently and maximally once per round. The click was assigned to the entity closest to the participant’s avatar within a distance of 70 pixels within one second before the click. This could be either the other avatar, the shadow, or the static object. If none of these entities were within the 70-pixel distance, the click was categorized as unclassified. Correct detection was defined as a click within a distance of 70 pixels from the other avatar. This is a binary variable that was calculated per round per individual. It was rated as “1” if a click was correct (i.e. assigned to the other avatar), and “0” if a click was incorrect (i.e. all the other instances where there was a click). Correct detection of the other is a variable that is independent of the other participant’s click and will be hereafter referred to as correct detection.

Subjective experience of interaction

In order to measure participants’ subjective experience of interaction, three items were used: “To what extent did you feel that the other could sense your presence?”, “To what extent did you feel you were doing something together?”, and “How confident were you that you clicked correctly?” Subjective experience of interaction was measured with a seven-point Likert scale ranging from “1” not at all to “7” very much. The items on the subjective experience of interaction were presented after each round to assess participants’ experience during that entire previous round. The item “How confident were you that you clicked correctly?” rated the confidence of clicks and included “I haven’t clicked” as an answer option, which was coded as a missing value. Subjective experience of interaction will be computed as the average of three items after rounds including a click, and as the average of two items after rounds without a click. This variable will be hereafter referred to as subjective experience.

Click success

We defined the variable Click success per round with four levels per pair. This variable was derived from correct detection, but is a paired variable where the value of the individual is dependent on the value of the other within the pair. It was coded as follows: 0 = no success (both players scored 0 on correct detection), 1 = single success (this player scored 1 on correct detection, the other scored 0), 2 = double success (both players scored 1 on correct detection within the same round, irrespective of the time interval within the round), and 3 = joint success (both players scored 1 on correct detection within a distance of 70 pixels within the same one-second time interval). No click was coded as a missing value.

Analyses

For each analysis, age and gender were added as a priori covariates. Further, school, pair, and participant were added as levels in the multilevel analyses to account for the nesting of rounds within participants within pairs and within schools. If data were collapsed across rounds per individual, the participant level was left out. An exploratory factor analysis was conducted to statistically test whether the three items assessing subjective experience could be reduced to one or two variables.

In order to compare the mean amount of time spent together (i.e. with the other avatar) with the mean time spent with both the shadow and the static object, paired t tests on collapsed data per individual across all rounds were used (hypothesis 1a). Logistic mixed-effect regression with only an intercept was conducted to test if the intercept was equal to zero, i.e. testing if the probability of correct detection was at chance level (0.5) (hypothesis 1b). To test the hypotheses that subjective experience (dependent variable) was related to time spent together (1c), proportion of correct detection (1d), or click success (1e), multilevel mixed-effect regression analyses were estimated in three separate models. For hypotheses 1c and 1d, data were collapsed per individual across all rounds. The proportion of correct detection was calculated per individual by dividing the total number of correct clicks by the number of total clicks.

Multilevel mixed-effect (logistic) regression analyses were fitted to examine whether the three main outcome variables time spent together, correct detection, and subjective experience were predicted by round as independent variable (hypotheses 2a, b, and c). Random intercept and slope were allowed, and only linear models were fitted.

Finally, to examine whether there was a difference in the average level of main outcome variables (variables time spent together, correct detection, and subjective experience) in rounds one to six versus rounds seven to ten, a dummy level (rounds 1 to 6 = 0; rounds 7 to 10 = 1) was used as predictor variable (hypotheses 3a, b, and c). For these analyses, only the data from a subsample of 32 participants who completed ten consecutive rounds were used. All analyses were preregistered (confirmatory).

Results

Sample and data characteristics

Descriptives

The initial sample included 164 participants. Sixteen participants were excluded from analyses because of technical issues with the apparatus. The final sample included a total of 148 participants, of whom 116 completed six rounds and 32 completed ten rounds of the PCE. Forty participants were attending first year, 32 third year, and 76 fifth year in the secondary education system in Belgium. The age ranged from 12 to 19 years.

Visual inspection of the outcome variables and Shapiro–Wilk tests showed right-skewed distributions of the variable time spent together. In order to meet the requirement for normal distribution of outcome variables, the variables reflecting time spent together (as well as time spent with other shadow and static object) were square-root-transformed. This transformation was selected because the data were right-skewed and included zero values. In reporting the results, the mean values were back-transformed by squaring the values. To facilitate interpretation, we reported time spent together in seconds.

Exploratory factor analysis on subjective experience of interaction

Across rounds and per round, the three items used in the current study showed a significant Bartlett’s test and a KMO [Kaiser–Meyer–Olkin] above .5, fulfilling the requirements to conduct a factor analysis. The results indicated presence of one underlying factor. Our subjective experience score therefore reflected the explicit awareness of the other and the other’s conscious awareness of the self, in combination with confidence of the presence of the other. Across rounds and per round, the mean score subjective experience indeed supported one underlying factor with an eigenvalue above 1. Mean scores of each item showed low uniqueness values (ranging from .10 to .39), indicating that their variance was well explained by the variable subjective experience. The inter-item reliability of the three items per round was high, with a Cronbach’s alpha ranging from .85 to .92.

Social contingency detection across rounds

Time spent together

Across rounds, time spent together (mean = 20 s) was significantly higher than time spent with the shadow (mean = 11 s; p < .001) and the static object (mean = 18 s; p = 002). Time spent with the shadow was significantly lower than time spent with both the other avatar and the static object (p < .001).

Correct detection

Correct detection of the other was not at chance level (p = .010). Participants clicked during 79.2% of rounds (805 out of 1016 potential clicks). In 41.9% of these cases, the click was correct, i.e. detection of the other was successful. In 6.6% of cases, both the other avatar and either the other avatar’s shadow or the static object were within 70 pixels of the avatar. In these cases, a click was assigned to the other avatar, i.e. defined as correct. Clicks were assigned to the static object in 33.8% of total clicks and to the shadow in 15.3% of total clicks. In 9.1% of cases, the clicks were categorized as unclassified because these did not occur within a distance of 70 pixels from any of the entities.

Subjective experience

Subjective experience increased when more time was spent together, although this did not reach statistical significance (B = .04 (.02), 95% CI: −.00 to .08, p = .074) whereas subjective experience was significantly associated with click success (B = .18 (.04), 95% CI: .10 to .27, p < .001). It was not associated with proportion of correct detection (B = −.01 (.41), 95% CI: −.8 to .79, p = .973).

Click success (associated with subjective experience)

Across rounds, 58.1% of clicks were incorrect, 22.4% were single successes, 16.6% were double successes, and 2.9% were joint successes. Due to the low frequency of joint successes, these were counted as double successes, resulting in 19.5% of clicks falling into this category. Compared with an incorrect click, double success was associated with a significant increase in subjective experience (B = .23 (.07), 95% CI: .10 to .36, p = .001). The difference between single success and double success was also associated with a significant increase in subjective experience (B = .22 (.08), 95% CI: .07 to .38, p = .004). The difference between incorrect clicks and single success clicks was not associated with an increase in subjective experience (B = .01 (.06), 95% CI: −.12 to .13, p = .929).

Learning of social contingency detection across rounds

For each outcome variable, we tested whether the average level changed throughout the experiment, from round one to round six. First, round was not significantly associated with time spent together (p = .719), such that time spent together remained at a similar level during the experiment. This is illustrated in Fig. 3, in addition to showing that the time spent with the shadow remained at a similar (lower) level. Moreover, the illustration shows a decreasing trend of time spent with the static object after the third round. Second, round was associated with a significant increase in the probability of correct detection (B = .07 (.04), 95% CI: .00 to .14, p = .05), such that there is some indication that the probability of correct detection increased per round. Click assignment started in the first round more or less at random with about the same number of clicks assigned to the other avatar, static object, or shadow. With successive rounds, there was a clear increasing trend, with half of the clicks assigned to the other avatar at the sixth round. The other half of click assignment was distributed over the other entities with decreasing numbers to the static object and the shadow. Correct detection was 25% after round one and 50% after round six. Lastly, round was associated with subjective experience (B = .07 (.01), 95% CI: .04 to .10, p < .001), such that subjective experience increased across rounds.

Fig. 3
figure 3

The amount of time spent with entities per round. The bold line represents the amount of time spent with the other avatar (ava), the light line represents the amount of time spent with the static object (static), and the dotted line represents the amount of time spent with the shadow (shadow)

Comparison of average social contingency detection levels between the six-round version and extended ten-round version

In order to compare the six-round version with a ten-round version, a subsample of 32 participants who performed ten rounds was used for the analysis. The dummy variable reflecting either rounds one to six (0) or rounds seven to ten (1) was not significant in any of the associations tested for research question 2. This indicated that there was no evidence for a change in average level of time spent together, correct detection, and subjective experience in rounds one to six compared with the average level in rounds seven to ten. The results for time spent together and correct detection are illustrated in Fig. 4, in which the average for these variables is shown per round.

Fig. 4
figure 4

The average time spent with entities in seconds (upper panel) and the average correct detection in percentage of total clicks (lower panel) per round for the subsample of 32 participants who completed ten consecutive rounds. In the upper panel, the bold line represents the amount of time spent with the other avatar (ava), the light line represents the amount of time spent with the static object (static), and the dotted line represents the amount of time spent with the shadow (shadow)

Covariates

Age was a significant covariate in testing the association between round and correct detection, such that there was a significant positive effect on the average level of correct detection. Gender was not significant in any of the associations.

Discussion

Main findings

This is the first study that used the PCE in adolescents in order to assess real-time social contingency. Our results showed that the six-round version of the PCE had the capacity to assess social contingency detection in adolescents, across all rounds, in terms of amount of time spent together and correct detection of the other. Across rounds, correct detection of the other improved and the level of subjective experience of interaction increased. Importantly, we found subjective experience to be increased for rounds with double correct clicks compared with rounds with single and incorrect clicks. The average level of social contingency detection did not change significantly in rounds seven to ten compared with rounds one to six.

Comparison with previous findings

Overall, our six-round setup in adolescents has shown a similar capacity for assessing social contingency detection as was shown in previous studies in adults that used a more extended setup (e.g. Auvray et al., 2009; Froese et al., 2014). This indicates that the setup used in this study is feasible in an adolescent population, and that the shortened version has a similar capacity for assessing social contingency detection as the longer version used previously. We have shown that correct detection of the other was, on average, not at chance level. Further, we reported a similar percentage of absent clicks compared with Froese et al. (2014). Contrastingly, differences between number of clicks assigned to the other avatar and the static object were less marked compared with this previous adult study. This may be explained by the current setup’s absence of training rounds, which were included by Froese et al. (2014). In this training phase, participants became familiar with distinguishing the regular stimulation received while moving back and forth across a static object, and the comparatively regular stimulation received when two players engaged in a coordinated back-and-forth interaction (Di Paolo, Rohde, & Iizuka, 2008). Indeed, Fig. 3 indicates that players started spending less time with the static object after three rounds, suggesting that they distinguished this entity from the other avatar after having experienced both stimulations in the first three rounds. Further, we reported a lower correct detection rate compared with Froese et al. (2014). This difference could be interpreted in light of a more advanced level of decision-making in adults, specifically affecting the decision to click. That is, although adolescents’ number of explicit judgments about an interaction (i.e. clicking) was lower than what was found in adult studies, they did spent most time together and less time with other entities, which was also expected based on these previous adult studies (Auvray et al., 2009; Froese et al., 2014). They were also successful in ignoring the shadow, as evidenced by spending the least amount of time with this entity, most likely because of its unstable, non-responsive character, which did not need sustained attention to successfully reach the goal of the task. In other words, while participants were successful at engaging in interaction, they did not make this explicit as often as adults did. Although we cannot conclude from our findings whether this is due to being unaware of the other or to lacking judgment while being aware of the other, we argue that this difference is likely to be explained by the age difference between the compared samples. Indeed, age had a significant positive effect on the average level of correct detection. This warrants future subgroup analyses of age, for instance to test the hypothesis that the capacity of making explicit judgments about social contingency continues to develop during adolescence.

This is the first study to replicate the increase in subjective experience in cases of mutual correct detection compared with single detection and incorrect detection, as was found by Froese et al. (2014). This serves as proof of principle that the subjective experience of social interaction is not something specific for one individual in the interaction or related to social cognitive capacities of one individual, but rather comes about as the result of a dynamic coupling of two individuals in the interaction. The partners in the dynamical system experience the most interaction when both have detected the social contingency. This was further supported by the different associations between subjective experience and, on the one hand, proportion of correct detection (individual variable), and, on the other hand, click success (paired variable). As the three items assessing subjective experience formed one single factor, this indicated that participants were particularly aware of the other participant via the other’s interactional directedness toward themselves. Taken together, these results demonstrate the importance of studying social interactive capacity for social contingency detection that is associated with the experience of interaction, rather than studying cognitive processes internal to the individual’s brain (Buzan, Kupfer, Eastridge, & Lema-Hincapie, 2014).

Our findings in random pairs from the general population showed that time spent together did not change per round, indicating that participants kept exploring the space rather than increasingly staying with the other avatar. In contrast, Zapata-Fonseca et al. (2018) found controls to decrease their exploring behavior in interaction with individuals with autism spectrum disorder. This may also be explained by the difference in age, with an adult population more easily reaching a decision and sticking to what they think is the other person. Alternatively, it may be due to the characterization, with the healthy controls adapting their interaction strategy to their partner with autism spectrum disorder.

Are six rounds sufficient to capture social contingencies with the PCE in adolescents?

As illustrated in Fig. 4, the level of time spent together, correct detection, and subjective experience did not further increase after six rounds when the experiment was extended with four additional rounds. After four rounds, there is a decrease in the percentage of correct detection. This sudden drop to nearly the participants’ average starting level suggests that something changed in explicitly making judgments about the interaction after a few rounds. This could be explained by the concept of (reinforcement) learning, including implicit and explicit learning (e.g. Barch et al., 2017; Berridge, 2004). The literature on sensorimotor learning in specific has suggested that this starts with implicit learning, followed by explicit learning (Taylor & Ivry, 2011; Taylor, Krakauer, & Ivry, 2014). These previous studies showed a decrease in performance when participants started to employ an explicit strategy to reach a goal, and it was suggested that this is due to a shift from action based on sensory-prediction error (i.e. difference between actual and predicted outcome) to action based on target error (i.e. difference between actual and targeted outcome). The latter can be interpreted as a shift to problem solving, in which participants attempt to use a cognitive strategy, which first leads to worse performance but is followed by a synergy of both ways of learning. This idea is in line with our findings in showing an increase in performance again after four rounds. It is also a hypothesis that requires further study, as the performance stabilized at a similar level as before, which might be lower than expected from a synergistic mode of sensitivity to social contingency detection. Nevertheless, this stabilization of performance did provide evidence that six rounds are sufficient to capture a stable level of social contingency detection and learning thereof. Our findings are also in concordance with subjective free-text reports obtained within the fifteen-round version by Froese et al. (2014), suggesting that players became aware of the other after only a few rounds already. More variation in time spent with entities in Fig. 4 compared with Fig. 3 is probably due to the smaller sample size used in the analysis of the ten-round version of the experiment. Indeed, the standard error for the mean values given in Fig. 3 was lower than the standard error for the mean values given in Fig. 4 (supplementary material C). This was the case for each round, except for the fourth round, in which the standard deviation and error were higher in Fig. 3 compared with Fig. 4. The drop in time spent together during this round is in line with the sudden drop of proportion of correct detection during this round, potentially explained by the earlier mentioned concept of (reinforcement) learning. Overall, based on these results, a six-round version of the PCE seems reliable and valid in an adolescent sample. It would therefore be interesting to use this setup in order to further investigate the reason underlying the lower correct detection rate in adolescents compared with adults.

Future considerations regarding methodology

The 70-pixel interval used for click assignment could be tailored to the data to determine the specific sample’s optimal proximity range. This may be important in samples characterized by different styles of decision-making compared with healthy adults, such as in patients with psychosis (Garety et al., 2018), or in a younger population such as the sample used in the current study (Crone, 2013). Another consideration is to include measures of mutual coordination, for example by using complexity matching at the pair level (Kojima et al., 2017; Zapata-Fonseca et al., 2019), or time series analysis for turn-taking (Zapata-Fonseca et al., 2016). While a correct click was defined as the correct, but explicit, detection of the other from the experimenter’s point of view, the actual interaction or co-regulation might not always need to be made explicit in order to be successful from the participants’ subjective perspective. Indeed, our correct detection rate in adolescents was lower compared with adults. Further, Zapata-Fonseca et al. (2018) have shown that click correctness did not distinguish participants with high functioning autism from controls, while interaction patterns differed. Potential ambiguity in the interpretation of quantitative findings could be solved by including a qualitative aspect and comparing this with the quantitative findings (Froese et al., 2014). Finally, we expect the PCE to explain more variance in social interaction compared with less ecologically valid experiments that focus on the individual. This hypothesis needs to be substantiated by first studying associations between our experimental findings and other ways of measuring social interaction, such as retrospective self-report questionnaires and momentary assessments in daily life. This would also provide studies investigating social interaction and social functioning with a paradigm to answer research questions about underlying mechanisms of social behavior, its development, and its potential variability within and between individuals.

Conclusion

The current findings indicate that the assessment and learning of social contingency detection can be achieved in an adolescent population by using a short and simple setup, without requiring training or complicated instructions. The potential role of age in social contingency detection warrants its inclusion in prospective studies that will aid in elucidating the complex nature of social interaction, even more if the link with social functioning can be established.

Open Practices Statements

None of the data or materials for the experiments reported here is available. The study was preregistered at the website of the Open Science Framework, available via https://osf.io/jmbdr/?view_only=9206a27ca3834a7da8da116b6154d1ad. Discrepancies between the preregistration and the final report are detailed in supplementary material D.