1 Introduction

Learning processes such as operant and respondent conditioning are thought to play a major role in chronic primary back pain (CBP), where they lead to fear of pain and avoidance behaviors (Flor and Turk 2011). It is assumed that pain and catastrophizing thoughts feed into the development of pain-related fears, which lead to the avoidance of activities such as movement. This acquired behavior, which interferes with daily-life activities, results in maladaptive consequences, which contribute to pain perseveration such as disuse, functional disability, negative affect and depression (Crombez et al. 2012). Avoidance can take different forms, from complete avoidance of specific movements (Volders et al. 2015) to safety-seeking behavior in guarded movements (Tang et al. 2007) and decreased movement variability (van Dieën et al. 2017). Fear avoidance models of chronic pain (Lethem et al. 1983; Vlaeyen and Linton 2000) explain the maintenance of chronic pain as part of such a vicious cycle of fear of pain, avoidance behavior, resulting interference with daily activities and the development of negative affect, which feeds back into an amplification and continuation of pain (Vlaeyen et al. 2016).

Accordingly, cognitive-behavioral treatments focus on the reduction of catastrophizing and pain-related fear and avoidance by having patients execute feared pain-related movements in a safe environment (Flor and Turk 2011; Main et al. 2014). Such exposure treatments in chronic pain (Vlaeyen et al. 2012), a tool from cognitive-behavioral therapy originally inspired by treatments of anxiety disorders, let patients execute potentially painful movements to extinguish fear and avoidance, which has been shown to reduce fear of movement and correct expectancies of pain and harm (Woods and Asmundson 2008), with effects generalizing over different tasks (Trost et al. 2008). The importance of pain expectancies in this process is highlighted by classical placebo effects that are elicited by social cues (Colloca et al. 2013; Klinger et al. 2017), and by observational placebo effects of other persons’ display of pain relief (Colloca and Benedetti 2009). Vicarious learning and observational modeling shape expectancies and behaviors in general (Bandura 1986) and in chronic pain, both in its development and with respect to therapeutic approaches (Goubert et al. 2011). In CBT treatments of chronic pain, therapists and other patients can serve as models for the execution of feared pain-related behaviors and behaviors are often videotaped and fed back to the patients, accompanied by explicit psychoeducation on the mechanisms behind chronic pain (e.g., Flor and Turk 2011; Vlaeyen et al. 2012). Showing patients how to perform adaptive healthy behaviors is also important in physical therapy related to chronic pain (e.g., Marich et al. 2018).

Virtual reality (VR) techniques provide an additional tool to extend and support pain treatment approaches that are based on physical training and CBT techniques. Home-based VR can offer an accessible, cost-efficient and easy-to-use tool to provide CBT skill-based trainings to a large group of patients, as a recent study in participants with CBP has demonstrated (Garcia et al. 2021). Thus, current VR approaches transcend distraction analgesia, the first successful therapeutic application of VR in pain (Hoffman et al. 2000; Peterson et al. 2021). Stimulating physical exercise is a main goal of VR applications in chronic pain, as a recent review of VR studies in CBP has highlighted (Bordeleau et al. 2022). However, only few studies use highly immersive technology for this purpose, usually by motivating movements via game-like elements (France and Thomas 2018; Jansen-Kosterink et al. 2013) or by feedback displayed on a virtual character (Alemanno et al. 2019), with a recent study successfully adapting such an exercise game to graded exposure schedules (Hennessy et al. 2020). Similarly, virtual reality training has been shown to improve symptoms of CBP in athletes (Nambi et al. 2021), and meta-analyses of non-immersive and immersive VR training treatments of CBP reveal promising results (Brea-Gómez et al. 2021), although the comparison to other treatments such as physical exercise is still inconclusive (Grassini 2022).

Other approaches address the significance of the plasticity of self-perception in modulating pain (Matamala-Gomez et al. 2019a, b). For example, the embodiment of virtual limbs (Slater et al. 2008), which may be evoked by colocation and synchronous visuotactile stimulation of the real limb and its virtual counterpart, can lead to a type of visually induced analgesia in both experimental pain settings (Nierula et al. 2017) and with respect to chronic pain (Matamala-Gomez et al. 2019a, b). While this research focuses on embodiment of avatars from a first-person perspective (Kilteni et al. 2012; Slater et al. 2009), VR also enables encounters with virtual doubles, or “doppelgangers”, in a third-person perspective (Bailenson 2012). This term refers to virtual lookalike characters of the respective observers. Although some studies employ full or temporal control of virtual doppelgangers on behalf of their observers (Gorisse et al. 2019; Slater et al. 2019), the distinctive doppelganger feature enables the possibility to display behaviors that are unfamiliar to the observer (Bailenson 2012). Like their counterparts in literary fiction (Nilsen 1998), virtual doppelgangers thus subvert the boundary between “self” and “other” when they act “autonomously”. This can be used to stimulate observational modeling by maximizing model-observer similarity, which plays an important facilitative role in observational learning (Braaksma 2002; Dove and McReynolds 1972; Hoogerheide et al. 2018). For example, virtual characters that resemble their observers in facial features and perform sportive exercise with concurrent virtual weight loss can motivate exercise behavior in healthy observers (Fox and Bailenson 2009). In accordance with these findings, high identification with doppelgangers was found to mediate an increased imitation in lateral spine flexion in healthy participants (Kammler-Sücker et al. 2021).

The current study set out to explore doppelgangers as a means to decrease movement-related fear avoidance behaviors in chronic back pain by stimulating observational modeling. We hypothesized that observing a doppelganger would lead to expectancy transfer, in that a doppelganger which performs potentially painful movements without any display of discomfort or pain would decrease participants’ pain expectancies with respect to their own movements, and enhance their voluntary motor imitation. The current study explored if the demonstration of back-related movements by a virtual doppelganger, compared to videos of these movements in a control group, would lead to reduced fear of pain and better performance of the movements, as assessed by motion tracking. By repeating the virtual experience for a total of three sessions, we could also test if this effect would be immediate or required training.

2 Methods

2.1 Participants

We tested 34 participants with chronic back pain (for details on the a priori power analysis, cf. Supplemental Materials). Eligibility criteria were chronic back pain lasting for more than 6 months and an age of 18–75 years. Exclusion criteria were any acute primary causes for back pain (e.g., injuries or inflammation), acute neurological complications, inability or medical prohibition to lift weights of up to 15 kg, and a history of epileptic seizures triggered by flickering lights. Eligibility and exclusion criteria were checked in an initial telephone interview and confirmed upon arrival in the laboratory. As this was an early-stage exploratory study of a potential add-on tool for current pain treatment protocols, participants were not required to pause any ongoing pain treatments but were advised to keep them constant during the assessment time. Although not a requirement, all participants turned out to be naïve to VR treatments of back pain. Participants were randomly assigned to the two intervention groups AVA (experimental group: doppelganger avatar) and VID (control group: videotaped movement model). All participants completed the experiment. Data of one participant were excluded from the analysis due to concurrent migraine on the days of two sessions, which may have interfered with perception of the virtual environment. The two groups did not significantly differ in gender, age, level of education and pain characteristics (see Table 1). Participants were mainly recruited by press releases issued online and in local newspapers and received 50.00 € for their participation. Informed consent was obtained and the study was approved by the Ethics Committee II of the University of Heidelberg (Medical Faculty Mannheim).

Table 1 Demographic and clinical characteristics of the participant sample

2.2 Baseline assessment

In the baseline assessment, the in- and exclusion criteria were verified and the participants completed a set of questionnaires that included a description of their pain and related clinical variables. We employed the German version of the West Haven-Yale Multidimensional Pain Inventory (Flor et al. 1990; Kerns et al. 1985), which assesses the impact of pain on respondents’ daily lives, the reactions of others to their pain, and to which extent the patients participate in daily activities. We focused on the first section with five subscales (Pain Intensity, Interference, Affective Distress, Social Support, Life Control). Sum scores are retrieved from the average responses to the scale items on a seven-point rating scale (0 to 6). Participants also completed the Graded Chronic Pain Scale (GCPS) (Von Korff et al. 1992), which allows for a grading of chronic pain (grades 1 to 4), based on intensity and pain-related disability inferred from responses on ten-point rating scales. To assess potential symptom burdens of anxiety and depressive mood, we also administered the Hospital Anxiety and Depression Scale (HADS) (Herrmann et al. 1995; Zigmond and Snaith 1983), which measures these two dimensions with separate scales (scored 0 to 21), summing up responses on four-point rating items. To assess functional capacity in daily life, the Hannover Functional Ability Questionnaire (FFbH) (Raspe et al. 1990; Raspe and Kohlmann 1991) was used, which uses three-point rating scales to derive a percentage score for functional capacity (0 to 100). Cognitive Expectancies related to fear and avoidance were assessed with the Fear Avoidance Belief Questionnaire (FABQ) (Pfingsten et al. 2000; Waddell et al. 1993), which has two subscales with pain beliefs and harm expectations with respect to physical activities in general, and to job-related activities in particular. We used the physical activities subscale, which retrieves a sum score (0 to 24) from responses on six-point rating scales.

2.3 Experimental design

The experiments were conducted in a four-sided Cave Automatic Virtual Environment in the VR Core Facility at the Center for Innovative Psychiatric and Psychotherapeutic Research (CIPP) at Central Institute for Mental Health (Mannheim, Germany). This allowed participants to see real-world objects clearly through the shutter-glasses and hence safely interact with them (e.g., with the crate of water bottles in one of the movement tasks). The experiment comprised experimental sessions and a preparatory and baseline assessment session (Fig. 1). We chose the number of three experimental sessions, in order to allow for an elementary time course analysis over sessions while at the same time keeping the scope of effort low for participants in this exploratory study. During baseline assessment and preparatory session (session 0), participants were informed about the experimental procedures and data management, signed informed consent and completed the assessment. They were familiarized with the laboratory environment and participants in the AVA group had their 3d photographs taken for later crafting of their doppelganger avatars (using a Kinect Sensor, Microsoft, Redmond, WA). The subsequent three experimental sessions (sessions 1–3) were at least 4 and a maximum of 117 days apart. Due to the pandemic situation we had to reschedule the participants when there was a ban on laboratory activity. The mean duration between sessions was 13.65 ± 16.08 days and not significantly different between the groups (Mann–Whitney U test, W = 620, p = 0.32). In each VR session, an initial assessment of pain expectancy and current pain state (see the section on questionnaire assessment below) was followed by the actual VR experiment. Participants in the experiment group encountered their virtual doppelganger avatars in the virtual environment (AVA), whereas the control group saw a virtual 2d screen inside the virtual environment showing a videotaped movement model (VID). Both groups were not aware of the other branch of the experiment, the two-group design was disclosed to the participants only after the last session. In the virtual environment, participants watched the virtual movement model perform a specific movement and the participants copied the movement three times based on a virtual sign and an auditory signal (for details on technical setup and stimulus design, cf. Supplemental materials).

Fig. 1
figure 1

Experimental setup. A Flow of experimental sessions. In a preparatory and baseline assessment session (session 0), participants were informed about the experiment, answered pen-and-pencil questionnaires, and were 3d-scanned (AVA group). In sessions 1–3, the actual VR movement experiment took place, accompanied by pre and post questionnaires (pen-and-pencil and with a remote control within-VR). B Sample avatars of participants. C Re-staging of: scan procedure with handheld Kinect sensor (upper left), CAVE setup and motion capture markers (upper right), and within-VR questionnaires (lower panel). D Re-staging of experiment for VID (right column) and AVA (left column) for all movements: crate-moving (CM), rotation in horizontal plane (RH), and lateral flexion (bending sideward, BS). Questionnaires: FABQ PA—Fear Avoidance Belief Questionnaire, Physical Activity Subscale; FFbH FC—Hannover Functional Ability Questionnaire, Functional Capacity Score; GCPS—Graded Chronic Pain Scale; HADS—Hospital Anxiety and Depression Scale (Anxiety, Depression); MPI1—West Haven-Yale Multidimensional Pain Inventory, Part 1 (Pain Intensity, Interference, Affective Distress, Social Support, Life Control)

The avatars were animated with prerecorded movements of a healthy model (motion capture with an infrared 12-camera system, OptiTrack, Corvallis, OR), who was also shown in the video recordings for the VID group. The movements were adapted from recent studies on pain-related movement kinematics (Laird et al. 2014) and expectancies (Klinger et al. 2017; Schmitz et al. 2019): lateral flexion of the spine (“bending sideward”, BS), spinal rotation in the horizontal plane (RH) and picking up a crate with water bottles (weight: 13 kg), putting it on a chair and moving it back to the floor (“crate-moving”, CM). This procedure was repeated for all three movement types, with three repetition cycles for the entire sequence (order of movement types randomized between cycles). In total, participants were asked to repeat each movement 9 times during each session. They could skip or shorten any movement cycle with a hand gesture at any time. After the experiment and still immersed in the VR, participants used a remote control to give ratings on a numeric ratings scale (NRS) on questions regarding the virtual model and virtual environment. Following the VR sessions, they answered questions about the movements and accompanying pain in paper and pencil format (described below).

2.4 Questionnaire assessment

Participants answered questions at three time points during the experimental sessions: in the beginning, after the movements when still in the virtual environment, and after having left the virtual environment. At the beginning of every experimental session, participants completed the FABQ again, together with the first section of the Multidimensional Pain Inventory (MPI1), which comprises the scales regarding pain intensity, interference, affective distress, social support and life control. They also reported their current pain level on a discrete numeric rating scale (NRS) from 0 (“no pain at all”) to 10 (“most intense pain imaginable”), and to which extent they feared that the following three movement types would amplify their back pain, i.e., the pain expectancy, on a NRS from 0 (no pain expected) to 4 (full agreement with the statement that movements would lead to pain), cf. (Klinger et al. 2017; Schmitz et al. 2019).

When still immersed in the virtual environment, participants answered an “engagement” question on whether they went to the limits of their capacity during the experiment, referring to all three movements together with an NRS from 0 (complete disaffirmation) to 6 (complete affirmation). After the VR part, every session was concluded by three questions (Klinger et al. 2017; Schmitz et al. 2019) for each movement (BS, RH, CM). The participants’ own perception of their ability was assessed by asking them whether they could perform the movement with an NRS from 0 (not at all) to 3 (unrestricted yes). They were also asked how strongly they felt limited in the movement by their pain with an NRS from 0 (no limitation at all) and 10 (complete incapability), and they reported their pain during the movement on an NRS from 0 (no pain) to 10 (most intense pain imaginable). Participants’ self-reported ability to perform each movement was then pooled to gain a functional capacity score in percent (average over all three movements, converted into percent of the maximal score possible). Movement limitation by pain and pain during the movements were also each averaged over all three movements, resulting in a score between 0 and 10.

After the movement task but still immersed in the VR, participants answered questions on their perception of and identification with the model, using the Autonomous Avatar Questionnaire with the three scales “identification/affiliation”, “perceived situational pleasantness for movement model” and “changes in body perception” (Kammler-Sücker et al. 2021). These showed high internal consistencies and were adapted by dropping single items to fit the experimental setup in this study (resulting in an AAQ-multimedia version, or AAQmm, see Supplemental Materials, Supplemental Tables 7–9). To assess the perception of the virtual environment (Slater and Usoh 1993), the Igroup Presence Questionnaire (IPQ) was employed, which measures participants’ presence in the virtual environment by asking for their involvement, experienced realism of the VR, and how strongly they felt relocated into the virtual world (Schubert et al. 2001). Both the AAQmm scales and the IPQ subscales were used as predictors for the analysis of VR-related influences on ROM. We also assessed symptoms of motion sickness and general wellbeing in VR. The responses of our participants indicated that they tolerated the experiment very well and experienced only weak symptoms, if any at all (see Supplemental Materials).

2.5 Motion data

To capture movements of the back in standing position, optical markers attached to the upper back and shoulder region were measured with the optical motion capture system that also tracked the 3d glasses for real-time rendering (four-camera system, OptiTrack, Corvallis, OR). We conducted quantitative analyses of two movement types (BS and RH), defining a functional range of motion (ROM) based on the amplitude of the respective oscillatory movement of the upper back/shoulders (Kammler-Sücker et al. 2021): the maximal rotations/deflections of the upper torso to both sides during the movement define a movement range in degrees for the current movement repetition. We treated each repetition as a separate data point, hence with maximally nine data points per session and movement. Sessions with poor overall tracking quality for the respective movement (less than three trackable ROM values) were excluded (11 sessions for BS, 0 for RH). We also checked the motion data for within-subject outliers and manually inspected the respective motion trajectories (4 outlier data points removed for BS, 3 for RH). Data of subjects with only one remaining session were excluded as a whole (3 for BS, 0 for RH). The final data sets consisted of 853 observations in 33 subjects for RH, and of 657 observations in 30 subjects for BS (due to tracking-related missing values). The software MATLAB 2019 (MathWorks, Natick, MA) was used for post-processing of motion data and cleaning of tracking artifacts.

2.6 Statistical analysis

We used multilevel modeling to analyze both motion and self-report data based on linear mixed effects models (LMEMs), an extension of multiple regression for data sets with grouped structure. LMEMs allow for group-specific deviations (random effects) from grand-sample regression coefficients and intercepts (fixed effects), thus resulting in additional group-wise random slope and random intercept estimates. In our analyses, we allowed for a full covariance matrix of random effects and only included random intercepts. We employed the restricted maximum likelihood method (REML) method to estimate model coefficients, except for analyses involving model comparisons via likelihood ratio tests, in which case the maximum likelihood (ML) method was used. Error estimation of t and p(t) values for fixed effects coefficients relied on the Kenward–Roger approximation method (Kenward and Roger 1997). Analyses of variance (ANOVAs) of the fixed effects in the LME models were also conducted with the Kenward–Roger method, as implemented in the R package lmerTest (Kuznetsova et al. 2017). If interactions were analyzed, a separate model including the interaction term(s) was analyzed in addition. If the latter revealed significant interactions, post-hoc contrast analyses of marginal means were applied, testing for differences between the levels of one variable with the other factor level held constant, and vice versa. Effect sizes were estimated by standardized regression weights βz (Hox et al. 2018, p. 18). For all LME analyses, the R package afex (Singmann et al. 2020) was used, which builds on the lme4 package (Bates et al. 2015), and model-based estimates for expected mean values (simple or marginal effects) were calculated with the package emmeans (Lenth 2020). In pairwise testing of group differences in interactions, correction for multiple testing used the false-discovery rate (FDR) correction (Benjamini and Hochberg 1995), as implemented in the multcomp package (Hothorn et al. 2008), applied separately for each direction of marginalization.

For analyses of motion capture data, the basic LME models for both ROMBS and ROMRH were fitted separately, with the only predictors being treatment group (either AVA or VID) and session. The latter was modeled as a factor variable to account for varying inter-session intervals. This basic model was then extended by an interaction term between group and session. If the mixed-model ANOVA revealed this interaction to be significant, it was further analyzed by post-hoc comparisons of the estimated marginal means. In addition, we also analyzed extended models for ROMs to assess influences of other predictors and to check the validity of our results. We assumed that ROMs would be decreased by current pain state and pain expectancy, stimulated by identification with a model perceived to be comfortable in its movements, increased by immersion and presence, and influenced by demographic characteristics. Thus, the additional predictors were the MPI scale on pain intensity (interference scale had a correlation of 0.72 with this scale and was thus not added), pain expectancy (with a correlation of 0.41 with the FABQ physical activity score, so the latter was not included), AAQmm1 (identification) and AAQmm2 (situational pleasantness; AAQmm3 on changed body perception not included due to boundary effects toward zero), IPQ presence (total score, as suggested by high correlations of 0.43 to 0.56 between IPQ subscales), age in years, and gender. In general, these extended models did not change the outcomes of the basic models. Self-report measures were analyzed with similar models. The influence of our experimental manipulation on prior pain expectancy was modeled with a basic model involving session and experimental groups, as well as their interaction. Models for posterior self-reports included group, session and three additional predictors. These were pain expectancy, to detect effects of prior expectations, and the averaged ROMs for BS and RH, to assess potential correlations between motion capture and self-report. Relevant self-report outcomes regarded motor engagement (based on the NRS item on how far participants had gone to their limits), functional capacity, limitation of movements by pain, and pain during the movements.

In post-hoc moderation analyses, we extended the basic models for ROMs and self-reported pain after the movements by adding a binary level of pain expectancy as a potential moderator. The level of pain expectancy for a session was defined as “low” for pain expectancy ratings of 0 or 1, and “high” for ratings of 2–4. To assess potential moderator effects of pain expectancy level on group effects, LME models with two-way interactions were analyzed. As moderation could also affect session-dependent group effects, three-way interactions between pain expectancy level, group and session were also included.

3 Results

Descriptive statistics for all outcome variables can be found in Table 2. The raw data distributions after preprocessing for both lateral flexion (BS) and rotation in the horizontal plane (RH) are shown in the Supplemental Materials, Supplemental Figs. 1 and 2.

Table 2 Descriptive statistics of behavioral outcome variables
Fig. 2
figure 2

Results of post-hoc contrast analysis for basic model for ROMRH. Raw data (subject-specific averages) are shown in gray and broken down by session, with AVA group indicated by spheres and VID by triangles. Model estimates for marginal means by group and session are shown in red (AVA) and blue (VID). Pairwise contrasts are depicted with respective p values as estimated with Kenward–Roger approximation and FDR correction (Benjamini and Hochberg 1995; Kenward and Roger 1997), with session-wise group comparisons in green and comparisons between sessions in red (AVA) and blue (VID). Note the decrease for AVA between second and third session, while the constantly lower values for VID do not change except for a marginally significant decline between first and second session. The quantitative values are reported in the Supplemental Materials, Supplemental Table 1 (colour figure online)

3.1 Range of motion

Parameter estimates of the LME models for ROMs (both for BS and RH) are reported in Table 3. For bending sideward (BS), there was no significant effect of treatment group on ROMBS, F(1,27.00) = 0.30, p = 0.59, βz = −0.11. However, there was a significant effect of session, F(2,625.28) = 26.15, p < 10−10, which was driven by a significant decline in ROM between the first and third session (βz = −0.11). There were no significant interactions between group and session, F(2,623.29) = 0.17, p = 0.84.

Table 3 Coefficients of basic model for ranges of motion (ROM)

In the basic model for ROMRH, the effect of AVA versus VID group had a positive sign, but did not reach significance, F(1,31.00) = 2.26, p = 0.14, βz = 0.23. Again, session had a significant effect, F(2,818.14) = 7.57, p < 0.001, driven by a significant decline in ROM between first and third session (βz = 0.08). In contrast to BS, there was a significant interaction between group and session, F(2,816.14) = 6.28, p < 0.01. Post-hoc contrast analyses (see Fig. 2, detailed results in Supplemental Materials, Supplemental Table 1) showed no significant effect of session in the VID group. In the AVA group, in contrast, there was a decline between session 2 and session 3. With respect to contrasts between groups, the consistently positive difference in marginal means between AVA and VID was not significant for any session.

The extended models for both RH and BS confirmed the main effects found in the basic models (see Supplemental Materials, Supplemental Table 3 for complete ANOVAs and Supplemental Table 4 for model coefficients). Although with small effect size, pain expectancy was a significant predictor for both BS, F(1,621.21) = 29.07, p < 0.001, βz = −0.09, and RH, F(1,822.58) = 4.10, p = 0.04, βz = −0.04. Besides this, the only other significant effect was exerted by AAQmm2 (situational pleasantness) on ROMRH, again with small effect size, F(1,818.43) = 5.62, p = 0.02, βz = −0.26. Pain state (MPI1 pain intensity), identification (AAQmm1), and presence (IPQ) did not show significant effects. Demographic characteristics showed a marginally significant effect in two cases, namely gender for ROMBS, F(1,25.99) = 3.79, p = 0.06, βz = −0.78 (male compared to female), and age for ROMRH, F(1,29.36) = 3.05, p = 0.09, βz = −0.27.

3.2 Pain expectancy

For pain expectancy, there was no significant main effect of treatment group, F(1,30.81) = 1.67, p = 0.21, βz = 0.17. The effect of session did not reach significance, F(2,60.57) = 2.16, p = 0.12. However, there was a significant interaction effect between group and session, F(2,60.57) = 3.33, p = 0.04. Post-hoc contrast analyses of estimated marginal means (see Fig. 3, detailed results in Supplemental Materials, Supplemental Table 2) revealed no significant effect of session in the VID group. In contrast, the AVA group showed a significant increase in pain expectancy between session 1 and session 2 (p = 0.03). The higher marginal means for group AVA compared to VID in the second and third session were not statistically significant (p = 0.13 for both sessions).

Fig. 3
figure 3

Results of post-hoc contrast analysis for basic model for pain expectancy. Raw data (subject-specific averages) are shown in gray and broken down by session, with AVA group indicated by spheres and VID by triangles. Model estimates for marginal means by group and session are shown in red (AVA) and blue (VID). Pairwise contrasts are depicted with respective p values as estimated with Kenward–Roger approximation and FDR correction (Benjamini and Hochberg 1995; Kenward and Roger 1997), with session-wise group comparisons in green and comparisons between sessions in red (AVA) and blue (VID). Note the decrease for AVA between second and third session, while the constantly lower values for VID do not change except for a marginally significant decline between first and third session. The quantitative values are reported in the Supplemental Materials, Supplemental Table 2 (colour figure online)

3.3 Engagement, pain and function

The LME analyses for participants’ self-reports after the movements revealed a similar pattern for all four variables, i.e., the engagement question on whether participants had gone to their limits, the functional capacity score, the average limitation of the movements by pain, and the average pain during the movements. Descriptive statistics per group and session for all these measures are shown in Table 2. We report the results of the mixed-model ANOVAs for all corresponding LME models in the following (model coefficients in the Supplemental Materials, Supplemental Table 5).

The intervention group had a marginally significant effect on the engagement item, F(1,26.97) = 3.22, p = 0.08. Here, the AVA group tended to report higher levels than the VID group, with a medium effect size βz = 0.30. With respect to pain during movement and functional capacity, there were no significant effects of intervention group (functional capacity: F(1,24.77) = 0.32, p = 0.58; limitation: F(1,25.68) = 1.09, p = 0.31; pain: F(1,25.91) = 0.27, p = 0.61).

Session did not show significant effects on any self-report after the movements, neither for engagement (F(2,53.68) = 1.03, p = 0.36), nor for function (functional capacity: F(2,55.80) = 0.31, p = 0.74, limitation: F(2,50.63) = 2.19, p = 0.12) or pain (F(2,50.31) = 1.16, p = 0.32). Motor behavior as assessed with ROM (for movements BS and RH separately, averaged by session) did also not have any significant effects on self-report measures. This held for engagement (predictor ROMBS: F(1,37.93) = 0.43, p = 0.52; predictor ROMRH: F(1,55.54) = 0.006, p = 0.94), functional capacity (ROMBS: F(1,28.76) = 0.07, p = 0.80; RH: F(1,31.96) = 0.69, p = 0.41), limitation (ROMBS: F(1,30.82) = 0.84, p = 0.37; RH: F(1,38.45) = 2.53, p = 0.12), and pain (ROMBS: F(1,31.87) = 0.86, p = 0.36; RH: F(1,40.83) = 0.28, p = 0.60). Note that even when self-reports on pain and function were analyzed separately for each movement, ROMs did not show any significant effects (data not shown).

Prior pain expectancy, in contrast, was a significant predictor for all self-reports on pain and function (functional capacity: F(1,66.63) = 8.75, p < 0.01, βz = −0.25; limitation: F(1,72.48) = 10.78, p < 0.01, βz = 0.25; pain: F(1,71.56) = 10.49, p < 0.001, βz = 0.28). Similarly, there was a significant effect of prior pain expectancy on self-reported engagement, F(1,73.50) = 3.22, p = 0.03, βz = 0.17.

3.4 Moderation analysis: group, pain expectancy and session

In the moderation analysis for the outcome variable ROMBS, all interactions with pain expectancy level were significant (group × pain expectancy level: F(1,615.55) = 6.39, p = 0.01; session × pain expectancy level: F(2,614.15) = 7.95, p < 0.001; group × session × pain expectancy level: F(2,614.15) = 3.85, p = 0.02). Post-hoc contrast analyses of the three-way interaction (Fig. 4) revealed the following pattern. In the AVA group, there was a mild decline in ROMBS for both levels of pain expectancy (low: significant decline between first and third session; high: significant decline between second and third session), and for none of the sessions was the difference in ROMBS between low and high pain expectancy levels significant. In contrast, the VID group showed a continuous decline of ROMBS between sessions (at least p < 0.01 for all pairwise comparisons) for measurements with high levels of prior pain expectancy, whereas the marginal means did not show any decline for low pain expectancy. Consistently, a difference in marginal means for ROMBS developed in the second and third session (for both sessions at least p < 0.01), which had not been present in the first session. Contrasting AVA and VID group did not reveal any significant differences for any combination of session and pain expectancy level.

Fig. 4
figure 4

Results of post-hoc contrast analysis for interaction model for ROMBS. Raw data (subject-specific averages) are shown in gray and broken down by session and group (experimental group AVA, and control group VID). Model estimates for marginal means are indicated for low and high pain expectancy levels. Pairwise contrasts are depicted with respective p values as estimated with Kenward–Roger approximation and FDR correction (Benjamini and Hochberg 1995; Kenward and Roger 1997), with session-wise comparisons in green and comparisons between sessions in blue (low pain expectancy) and violet (high pain expectancy). Note the decrease in ROM for high but not for low pain expectancy levels in the VID group; the AVA group, in contrast, does not show differences between pain expectancy levels

For ROMRH, all interactions involving pain expectancy level turned out as at least marginally significant (group × pain expectancy level: F(1,813.80) = 4.16, p = 0.04; session × pain expectancy level: F(2,802.19) = 2.73, p = 0.07; group × session × pain expectancy level: F(2,802.19) = 2.41, p = 0.09). Post-hoc contrast analyses of the latter three-way interaction revealed a pattern similar to that found in BS (Fig. 5): the AVA group shows a continuous decline for both low and high pain expectancy (for low expectancy, at least marginally significant for both pairwise comparisons; for high expectancy, a marginally significant decline between first and third session), with no differences between expectancy levels for any session. In contrast, the VID group showed distinctly different patterns dependent on pain expectancy: in the former case of low expectancy, a marginally significant drop in ROMRH between first and second session was followed by a significant increase in the third session compared to the second session, thus arriving at a level comparable to the first session. In case of high pain expectancy in the VID group, however, the marginal means show a weak decline over sessions (marginally significant for first versus third session). Consistently, after the first two sessions without differences between pain expectancy levels, the third session then shows a highly significant difference in marginal means for ROMRH between low and high levels of pain expectancy. Contrasting AVA and VID group did not reveal any significant differences for any combination of session and pain expectancy level.

Fig. 5
figure 5

Results of post-hoc contrast analysis for interaction model for ROMRH. Raw data (subject-specific averages) are shown in gray and broken down by session and group (experimental group AVA, and control group VID). Model estimates for marginal means are indicated for low and high pain expectancy levels. Pairwise contrasts are depicted with respective p values as estimated with Kenward–Roger approximation and FDR correction (Benjamini and Hochberg 1995; Kenward and Roger 1997), with session-wise comparisons in green and comparisons between sessions in blue (low pain expectancy) and violet (high pain expectancy). Note the decrease in ROM for high but not for low pain expectancy levels in the VID group; the AVA group, in contrast, does not show differences between pain expectancy levels but only a generic weak overall decline over sessions (colour figure online)

For self-report, the three-way interaction models with predictors group, session, and pain expectancy level did not reveal significant interactions (detailed results in Supplemental Materials, Supplemental Table 6), except for pain during the movements. For this variable, the two-way interaction group × pain expectancy level was significant, F(1,76.30) = 6.99, p < 0.01, and was hence further analyzed with post-hoc contrasts. The interaction of session and expectancy level was marginally significant, F(2,62.87) = 3.02, p = 0.06. The three-way interaction of these variables, however, was not significant, F(2,61.87) = 1.49, p = 0.23.

The post-hoc contrasts for the interaction of intervention group and pain expectancy level (Fig. 6) showed a significant prediction of pain during movement by prior pain expectancy in the VID group. In contrast, no such relationship between pain expectancy level and subsequently reported pain was present in the AVA group.

Fig. 6
figure 6

Results of post-hoc contrast analysis for interaction model for pain during movements. Raw data (subject-specific averages) are shown in gray and broken down by group (experimental group AVA, and control group VID) and pain expectancy level. Model estimates for marginal means are indicated by dots (AVA) and triangles (VID). Pairwise contrasts are depicted with respective p values as estimated with Kenward–Roger approximation and FDR correction (Benjamini and Hochberg 1995; Kenward and Roger 1997), with group comparisons in green and comparisons between expectancy levels in red (AVA group) and blue (VID group). Note how treatment group acts as a moderator of the relationship between level of prior pain expectancy and pain during movements (colour figure online)

4 Discussion

4.1 Effects of model personalization

The aim of this study was the exploration of model personalization and its effects on motor performance and engagement, pain expectancy, pain and function. With respect to motor performance as measured by range of motion (ROM), no significant group differences were found, although for rotation in the horizontal plane (RH), there was a trend approaching marginal significance. The small effect size (βz = 0.23) suggests that larger sample sizes may be necessary to detect a significant effect here. For the other movement, bending sideward (BS), there was no significant group effect or trend. The same held for self-reports on pain and function after the movements. Thus, our data could not confirm a group effect on ROM, pain and function as hypothesized.

Participants’ self-perceived engagement, in contrast, showed a marginally significant effect of group, with the AVA group reporting higher levels on the question of how far they had gone to their limits. The weak but significant effect of pain expectancy (βz = 0.17) on self-reported engagement probably reflects a specificity of this item: as it was formulated relative to the subjects’ perceived limitations, subjects probably scaled their response to their perceived limits of ability and pain tolerance: thereby, participants with generally higher pain levels, reflected in higher pain expectancies, would report higher “relative engagement” for a specific level of activity if they sustained it despite of adverse effects.

For pain expectancy itself, our findings were contrary to our initial hypothesis: starting from the same level of pain expectancy as the VID group, the AVA group expressed a significantly increased pain expectancy in the second and equally the third session. In contrast, the VID group did not show changes in pain expectancy. Moderation analyses could cast some light on these seemingly ambiguous results. They revealed that the AVA group showed a “decoupling” of pain expectancy from its effects on motor performance (ROMBS and ROMRH) and self-reported pain (pooled for BS, RH, and crate-moving) as they were observed in the control group: in the VID group, high pain expectancy predicted higher experimental pain, and over the sessions, motor performance levels diverged for high versus low pain expectancy, such that the difference in ROM levels became significant in the last session. This can be interpreted as an example of avoidance elicited by pain expectancy. In the AVA group, in contrast, these patterns were not observed, with prior pain expectancy level lacking any significant effect on reported pain and motor performance over sessions, arguably reflecting a decoupling of avoidance behavior from pain expectancy.

4.2 Potential mechanisms and future research

Intervention with doppelganger models decoupled motor behavior and pain from prior pain expectancy. The effect was accompanied by a seemingly conflicting increase in pain expectancy over sessions. This could be a consequence of muscle ache after the first session, which would have confirmed and reinforced prior expectations if participants had engaged strongly despite high pain expectancy (as also indicated by anecdotal remarks of participants). This would also match the marginally significant effect of higher self-reported engagement in the AVA compared to the VID group. Members of the VID group, in contrast, may have engaged themselves less beyond their comfortable range of motion, and may thus have experienced less muscle pain after the sessions. This would also match with the absence of changes in ROMRH over sessions in this group. With respect to the AVA group, in contrast, it is noteworthy that the decoupling effect lasted over all three sessions (although it showed a decrease in ROMRH in the third session). This can be interpreted as a stimulation of pain tolerance and task persistence despite pain, i.e., a positive effect on participants’ ability to ignore and disregard concurrent nociception during movements, and to persevere in performing the task.

Several potential mechanisms may be at work here. Increased imitative tendencies, as shown in real-world and virtual chameleon and imitation effects (Chartrand and Bargh 1999; Kammler-Sücker et al. 2021), might have counteracted motor avoidance; however, this explanation is not supported by our extended models, which did not find effects of AAQmm1 identification score on motor behavior. Alternatively, an attention-grabbing doppelganger might have stimulated motivation. However, the lack of gamification elements would rather weigh against this explanation. In both movements, a small decline in ROM over sessions was significant. This suggests an overall trend of declining motivation during repeated sessions in our setup.

An observational placebo effect that generalized over sessions is also not supported by our data, as it would have been accompanied by a decrease in pain expectancy. However, a short-term within-session placebo overridden by between-session muscle ache would be consistent with the results. Theoretically, placebo effects may also have occurred independently of explicit expectancies, as has been shown for classical placebo effects (Bąbel 2019). However, whether effects of vicarious experience, such as observational placebo (Colloca and Benedetti 2009; Schenk et al. 2017) are possible without cognitive processing is a matter of some debate on the neural underpinnings of imitation (Bandura 1986; Duffy and Chartrand 2015; Greer et al. 2006; Hamilton 2015; Tomasello 2016; Zentall 2006).

Alternatively, observation of moving doppelgangers could have decreased harm expectancy (Crombez et al. 1999), in the sense of expectation of detrimental effects from the movements. Our data support this hypothesis as the apparent wellbeing of the movement model, assessed with the AAQmm2 “situational pleasantness” scale, had a significant positive effect on ROMRH (and a marginally significant effect on ROMBS). Observing the doppelgangers may also have increased self-efficacy expectations, which are distinct from control beliefs (Bandura 1977) and can facilitate pain tolerance on their own (Litt 1988). In our experiment, they might have increased task persistence despite limited control of pain. Self-efficacy is amenable to vicarious experience (Bandura 1977, 1998) and regularly addressed in CBT interventions (cf. Flor and Turk 2011). The closely related construct of pain resilience (Slepian et al. 2016) has been found to decouple motor performance from fear avoidance beliefs in in CBP (Palit et al. 2020). Further research on “doppelganger facilitation” could investigate these possible links between task persistence, self-efficacy, and pain resilience. Therefore, future studies should assess the relevant psychological measures in this regard, for example, by administering the Pain Self-Efficacy Questionnaire (PSEQ) (Nicholas 2007).

4.3 Clinical application

Decoupling fear from avoidance has been suggested as a specific leverage point for exposure treatments in chronic pain (Gatzounis et al. 2021), hence circumventing pain expectations which can be more persistent than avoidance (Janssens et al. 2019) or harm expectations (Riecke et al. 2020). Making use of current advancements in accessible avatar personalization (Bartl et al. 2021; Wenninger et al. 2020), virtual doppelgangers may thus provide a viable tool to address the vicious cycle of fear and avoidance. More accessible technologies for avatar personalization would also allow for hypothesis-driven replication studies for the exploratory findings discussed above. Until then, these findings, as not in accordance with our initial hypothesis, should be viewed as preliminary.

If future studies on virtual doppelgangers in chronic pain corroborate a decoupling of fear from avoidance and an increase in pain tolerance and task persistence, this would add a valuable tool to VR-based treatments of CBP, thus expanding treatment strategies such as distraction, changes in perceptual and neural body representation, and movement exposure with gamification elements (Bordeleau et al. 2022; Tack 2021). Virtual doppelgangers, which stimulate mechanisms of observational modeling and social learning, may add a complementary research strand to changing body perception by avatar embodiment in first-person perspective (Matamala-Gomez et al. 2019a, b). Our participants’ self-reports on motion sickness and general wellbeing suggest that virtual doppelgangers can offer a well-tolerated way of potential treatment. This would facilitate an advanced type of serious VR games for CBP and other chronic pain conditions, a type of doppelganger trainer exergames, combining the principles of distraction and reward-based gamification in exergames (France and Thomas 2018; Jansen-Kosterink et al. 2013; Stamm et al. 2020) with VR exposure techniques (Hennessy et al. 2020) and the observational effects on pain perception, tolerance and task persistence observed in our experiments.

4.4 Limitations

The lack of reward may be viewed as a limitation of this movement study, as reward for movement would probably have amplified effect sizes. However, rewards were deliberately left out for the purpose of isolating the effect of model personalization. Future investigations of virtual doppelgangers in chronic pain might extend this with gamification techniques.

Another potential limitation of our sample is adverse selection, as persons with high fear of movement or general anxiety might have been hesitant to participate in a movement study under pandemic conditions. All participants ranged above the upper threshold (70%) for clinically relevant disability on the FFbH (Kohlmann and Raspe 1996). This is in line with the rather low GCPS pain grades in our sample, which had median values between 1.5 and 2 (see Table 1). This ranges between the GCPS categorization (Von Korff et al. 2020) of “mild chronic pain” (1) and “bothersome chronic pain” (2), in contrast to “high impact chronic pain” (3). Based on the current proof-of-principle study, future studies of similar design should aim at recruiting more severely impaired participant samples and should specifically screen participants for movement-related fear of pain, since these patients might especially profit from this type of intervention (Hasenbring et al. 2012; Wertli et al. 2014). The presumably higher interference of fear avoidance with movements might create larger behavioral effects, allowing for a more rigorous testing of the hypotheses derived from the current study. Future studies should also seek to investigate measures of technology acceptance with respect to VR treatment in severely impaired clinical samples.

Closely related to this is the limitation that the current design might have failed to address the movements feared most by participants. Future studies may address this with a personalized assessment of movement-specific fears and an according adaptation of the presented movements. However, the efforts required by real-world motion capture of human models as well as the need for clearly defined ranges of motion for quantitative analysis limited the range of movement sets in the current exploratory study.

Another limitation of our study relates to possible demand characteristics, as participants of the AVA group noticed the effort that was put into avatar generation. However, we tried to minimize this effect with an equally detailed virtual environment for the VID group and by not revealing the other condition until completion of the experiment.

In addition, the pandemic-related high variation in inter-session intervals with partially considerably longer durations than the originally intended two-week margin adds a further limitation to the results of this study. It probably decreased the power to detect any effects on clinical pain variables. Future studies should organize sessions within strict schedules of inter-session intervals of maximally one week to intensify treatment effects and potentially allow for transfer to daily-life activities.

Based on our discussion above, future research might profit from including more severely impaired samples and by administering questions on harm expectancies to differentiate these from pain expectancies. Implicit physiological markers of pain could clarify the role of momentary placebo effects. For example, cortical hemodynamic activity might be measured by functional near-infrared spectroscopy during painful movements (Öztürk et al. 2021).

5 Conclusion

Virtual doppelgangers as movement models might provide an additional tool to current cognitive-behavioral treatments in chronic back pain and could potentially be included in future exposure setups. In virtually duplicating the observer’s body, they may create a learning situation in between first-person and vicarious third-person experience, facilitating task persistence and decoupling movement avoidance and experienced pain from prior expectancies. Future research should address replicability of these findings and investigate underlying mechanisms in this new type of virtual stimuli.