Introduction

One of the most critical determinants of successful learning and students’ trajectories is academic self-concept: Academic self-concept—that is, the evaluation of one’s own academic abilities (Marsh & Shavelson, 1985)—has been found to be positively associated with individual achievement (Trautwein & Möller, 2016; Valentine et al., 2004), achievement emotions (Pekrun et al., 2019), academic choices, and career aspirations (Göllner et al., 2018; Nagengast & Marsh, 2012). Importantly, students’ evaluations of their own academic abilities are substantially shaped by their peers and the reference group they are surrounded by (e.g., classmates; Buunk & Gibbons, 2007; Dijkstra et al., 2008; Festinger, 1954). The most prominent and empirically well-supported pattern of results in the context of social comparison effects on participants’ self-concept is the Big-Fish-Little-Pond Effect (BFLPE; Marsh, 1987). The BFLPE suggests that unless students are “the big fish in the pond” (i.e., above the average performance level of their reference group), they are constantly exposed to comparisons with more able peers and consequently tend to evaluate themselves as less competent than they would in a lower achieving class (Marsh & Hau, 2003; Marsh et al., 2021; Nagengast & Marsh, 2012; Seaton et al., 2009, 2010). In other words, the BFLPE suggests that two equally able students will evaluate themselves differently depending on the performance level of their classmates, whom they use as a reference for their self-evaluation.

Notably, BFLPE research typically uses achievement data and students’ self-reports of their domain-specific academic self-concepts. Such data are usually collected in real-world classrooms. Given the restrictions on opportunities to implement far-reaching experimental manipulations in schools, this also means that BFLPE research typically has to rely on nonexperimental designs. Such correlational research is limited in the extent to which it can provide causal evidence for the proposed (social comparison) processes; moreover, except for a few exceptions (Huguet et al., 2009; Marsh et al., 2014), social comparison processes as the BFLPE’s underlying mechanisms are typically deduced rather than observed or manipulated. Hence, despite an abundance of empirical support for the BFLPE, surprisingly little is known about how social comparisons in the classroom come about and ultimately impact students’ self-concept (Dai & Rinn, 2008).

To address these open questions, we used immersive virtual reality (IVR) as a methodology that is able to combine the advantages of field research in real classrooms with the affordances of experimentally controllable settings. Our objective in implementing this study was to simulate a classroom situation involving virtual classmates with different performance levels so that we could systematically examine how these different reference groups affect students’ perceptions of the class’ performance and their own self-concepts. Therefore, we examined (a) whether different proportions of high achievers in an IVR classroom situation would lead to distinct perceptions of a class’ performance level and (b) whether the empirically well-supported effects of social comparisons on students’ self-concepts (i.e., the BFLPE) could be reproduced in an experimental setting as such. Moreover, we examined (c) how the perceived performance level of the class affected the relationship between the manipulated proportion of high-achieving classmates and the expected results on students’ self-concepts.

Open questions in traditional BFLPE research

BFLPE research to date has struggled to provide clear answers and empirical evidence concerning three aspects of the proposed social comparison effects, namely, their causality, their underlying mechanisms, and their temporal development.

Speaking to the causality of the proposed effects, experimental support for the BFLPE is scarce (Zell & Alicke, 2009, 2010). Aiming to identify causal relationships between social comparison information and the resulting differences in self-concept, experimental research—predominantly grounded in social psychology—has tended to use lab-based settings, which clearly differ from authentic classroom environments. A first issue is that by offering manipulated performance feedback as social comparison information (Pyszczynski et al., 1985; Wheeler, 1966) or by introducing participants to other given (usually fictitious) comparison targets (Mussweiler, 2003; Mussweiler et al., 2004; Neugebauer et al., 2016), many experimental studies test effects of social comparisons that are artificially induced rather than naturally observed. Moreover, many experimental studies explicitly instruct students to make a comparison with a specific comparison target of their choice (Dumas et al., 2005; Huguet et al., 2001; Suls & Wheeler, 2000), and therefore, students’ attention is drawn specifically toward a single social comparison target, a process that might not unfold in a natural setting in the same way. Taken together, experimental findings—although they provide different results depending on the experimental design and specificities of the manipulation—have suggested that distinct reference groups and the respective social comparisons are in fact the cause of the BFLPE. However, the extent to which social comparisons are causally linked to observed effects on self-concept in a real-world—and therefore way more complex—classroom setting remains unclear.

In addition to these open questions about the BFLPE’s causality, when researchers declare that social comparisons are the mechanism that underlies the BFLPE, they are usually stating only an indirect conclusion rather than a conclusion that resulted from an explicit examination thereof. Ever since social comparisons were first identified in Festinger’s influential social comparison theory (Festinger, 1954), they have been widely acknowledged as a central aspect of human interaction (Buunk & Gibbons, 2007; Buunk & Mussweiler, 2001) and are particularly strongly linked to self-evaluations (Blanton et al., 1999; Buunk et al., 2007; Dumas et al., 2005; Huguet et al., 2001). Hence, from a theoretical perspective, students constantly compare themselves during learning and most likely use the best possible information available in any classroom situation to do so (Dijkstra et al., 2008); however, this tendency is scarcely reflected in research on social comparison effects in the classroom. There are a few studies that have examined the role of social comparisons as mediators of the BFLPE; for instance, by including the achievement of individually selected classmates or self-reports of students’ perceived relative standing in class in the analyses (in addition to the class' average level of achievement, which is typically used; Huguet et al., 2009; Marsh et al., 2008, 2014). However, similar to the experimental designs, the indicators of social comparisons used in these studies do not reflect actual (i.e., naturally occurring) social comparison behavior in the classroom. In sum, BFLPE research has hardly ever focused on actual classroom behavior and the role of students’ individual perceptions thereof.

Moreover, BFLPE research has not yet answered the question of how exactly, and even more so, how quickly certain perceptions of classmates impact students’ self-concept. From a theoretical point of view, it has been argued that the results of social comparisons in specific situations eventually result in a more stable (i.e., dispositional, trait-like) self-concept (Marsh & Shavelson, 1985; Shavelson et al., 1976). This idea is also in line with models from personality psychology that link short-term situational processes with long-term personality development (e.g., Wrzus & Roberts, 2017). For example, in a computational thinking class, the teacher might ask a question about how a computer code works, and many or hardly any members of the class might indicate that they know the correct answer. In situations like these, a student has a specific experience of being less or more able than their classmates. Experiences such as these are assumed to lead to short-term effects on the student’s self-concept (i.e., situational self-concept). When asked about how they evaluate their own computational thinking skills during the lesson, the student’s response will be substantially influenced by their perception of the classroom situation they just experienced. In the next lesson, if the student has a different experience, their response will likely be different. However, if the student repeatedly experiences similar successes or failures in this class and repeatedly evaluates themself as more or less able than their classmates, this pattern of situational self-evaluations will turn into a relatively stable idea of the student’s own ability in computational thinking (i.e., dispositional domain-specific self-concept), which ultimately is likely to generalize to broader self-evaluations in the domain, such as concerning computing skills or technological competencies in general (Suls & Mullen, 1982; Wigfield et al., 2015). To date, BFLPE research has mostly focused on students’ general domain-specific self-concepts. Considering that specific contexts and comparisons with students’ current reference groups substantially shape self-concept (e.g., Becker & Neumann, 2016, 2018), and attesting to the situation-sensitivity of competence beliefs (Eccles & Wigfield, 2020), it seems crucial to go beyond investigating social comparison effects only on general domain-specific dispositional self-concepts by additionally considering situation-specific perceptions of the self.

Immersive virtual reality as an experimental tool

The need to go beyond the existing designs that have been used to study the BFLPE and its associated social comparisons has been widely acknowledged. However, these challenges have been difficult to overcome in the absence of a methodological approach that allows for both experimental control and an authentic classroom situation. To address questions about causality and to uncover the mechanisms that underlie the BFLPE, there is a need for an ecologically valid research tool that permits the isolation and systematic variation of relevant variables (i.e., actual performance-related classroom behavior) while simultaneously preserving the authenticity and realism of a classroom situation. Immersive virtual reality (IVR) allows for exactly these things: realistic simulations and experimental designs (Blascovich et al., 2002; Parsons, 2015).

In general, IVR presents a computer-generated simulated environment that allows for realistic perceptions and seemingly real interactions in an artificial virtual world (Blascovich et al., 2002). Experiences in virtual reality environments can be described on a number of different dimensions (e.g., International Society for Presence Research, 2000; Lee, 2004; Lombard et al., 2009; Schubert et al., 2001; Slater, 1999; Witmer & Singer, 1998). To define the understanding of IVR classroom environments in the present study, three central constructs that have repeatedly appeared in respective discussions are used: immersion, presence, and realism. Immersion primarily refers to the technical features and the multimodality of virtual reality, involving state-of-the art virtual reality set-ups promoted with so-called head-mounted displays (HMDs) that enclose the entire eye area and are equipped with noise-canceling headphones to shut out stimuli from the real world as much as possible (see, e.g., Fox et al., 2009; Radianti et al., 2020). Presence describes the psychological state and subjective perception of being in the virtual environment, leading to the impression of IVR environments as “places visited rather than as images seen” (Slater & Wilbur, 1997, p. 3), entailing (a) a spatial perception of actually being in the virtual environment and (b) a social perception of being with others in the virtual environment (Lombard et al., 2009; Oh et al., 2018; Schubert et al., 2001; Slater & Wilbur, 1997). Realism refers to the “realness” (Schubert et al., 2001, p. 271) of the IVR environment in terms of how similar objects, people, and events in IVR are compared with those in the physical world (International Society for Presence Research, 2000; Schubert et al., 2001). Particularly in recent years, fast-paced technical developments in the field of software and hardware development have led to extremely realistic simulations that physically and mentally immerse users in surroundings that they can vividly experience just like the real world (Cummings & Bailenson, 2016; Makransky & Lilleholt, 2018; Slater & Sanchez-Vives, 2016).

The application of IVR began in the gaming sector and was quickly adapted by the military and health sectors, particularly for practicing the handling of situations that can hardly or cannot be simulated in real life (e.g., Blascovich et al., 2002; Carl et al., 2019; Parsons & Mitchell, 2002; Richards, 2017). Similarly, educational psychologists have started to use IVR as a tool for teacher training (Dieker et al., 2007; Huang et al., 2021; Lugrin et al., 2016; Richter et al., 2022). Beyond its application for training purposes, more and more disciplines have recognized and begun to use the enormous potential of IVR as a methodological tool that allows researchers to combine experimental control and authenticity in their research designs (e.g., Bailenson et al., 2008; Blascovich et al., 2002; Lanier et al., 2019). The use of IVR as a research tool has proven to be suitable and promising, especially in younger participants, who tend to be cognitively and behaviorally more responsive to IVR environments than adults (Bailey & Bailenson, 2017; Hite et al., 2019; Southgate et al., 2017). Against this background, IVR is particularly interesting for the learning sciences as it offers an experimental approach for conducting systematic and in-depth examinations of basic pedagogical-psychological theories in the—otherwise uncontrollable—classroom context (Bailenson et al., 2008; Blume et al., 2019; Kizilcec et al., 2015).

The present study

In the present study, we examined how individual differences in students’ academic self-concepts emerge and can be traced back to social comparison processes in an IVR classroom. More specifically, we investigated whether and to what extent students recognize classmates’ performance-related behavior (i.e., hand-raising) as social comparison information and how these perceptions explain individual differences in students’ self-concepts. We used an IVR classroom to implement a research design that allowed for the strict experimental control of social comparison information but preserved the authenticity of a real classroom situation.

In our IVR classroom environment, students learned the basic principles of computational thinking while surrounded by virtual classmates with systematically varied performance levels. We used the virtual classmates’ hand-raising behavior as the indicator of performance (e.g., Böheim et al., 2020) and manipulated the proportions of classmates who raised their hands to answer the teacher’s questions or otherwise indicated that they knew the correct solution to a task. The IVR setting made it possible for every participant to experience the exact same 15-min IVR lesson where the only difference was the performance of the virtual classmates. With this experimental manipulation, we aimed to test the typically found reference-group effect (i.e., the BFLPE) on participants’ self-concepts in a controlled and authentic environment. We aimed to use the IVR setting to gain insights into the emergence of effects of the reference group on students’ self-concepts. Therefore, we examined situational and dispositional self-concepts in the domain of computational thinking. We expected to see situation-specific effects on situational self-concept, and we furthermore explored whether the 15-min exposure to larger or smaller proportions of high-performing virtual classmates would (already) have an impact on dispositional self-concept.

In a first step, we examined whether and to what extent participants (a) recognized their classmates’ hand-raising behavior and (b) viewed this information as an indicator of classmates’ performance. We therefore assessed (a) the number of registered hand-raising classmates (to capture the extent to which students paid attention to their classmates’ behavioral responses to the teacher’s questions) and (b) participants’ perceptions of the class’ performance (to capture the extent to which students interpreted their classmates’ hand-raising behavior as an indicator of performance). In a second step, we examined the impact of classmates' hand-raising behavior (i.e., the proportion of hand-raising classmates) on individual participants’ self-concepts. We asked:

  1. 1.

    To what extent is classmates’ hand-raising behavior predictive of (a) the number of hand-raising classmates that participants register and (b) the perceived performance level of the class?

  2. 2.

    How is classmates’ hand-raising behavior related to individual participants’ situational and dispositional domain-specific (i.e., computational thinking) self-concepts?

We expected that the different proportions of hand-raising classmates would positively predict (a) the reported number of registered hand-raising classmates and (b) the perceived performance level of the class (Hypothesis 1). In line with the typical results of BFLPE research, we expected a negative effect of classmates’ hand-raising behavior on participants’ self-concepts, particularly their situational self-concept (Hypothesis 2). We furthermore explored whether the 15-min-long exposure to high-achieving classmates would lead to differences in participants’ dispositional self-concept.

Finally, provided that effects of the hand-raising conditions on participants’ self-concepts were found, we aimed to gain more insights into the mechanism underlying the BFLPE. Hereby, the reported number of registered hand-raising classmates (Research Question 1a) and the perceived performance level of the class (Research Question 1b) were assumed to reflect two different aspects involved in social comparisons (the reported number of registered hand-raising classmates is a rather objective rating, whereas participants’ perceptions of the class’ performance level reflect a subjective judgment from the participants). To provide a comprehensive picture of how students’ perceptions of their classmates’ hand-raising behavior affected their self-evaluations, both variables were considered. More specifically, we asked:

  1. 3.

    Are observed effects of classmates’ hand-raising behavior on participants’ self-concepts explained by their perceptions of the class’ performance level (i.e., the reported number of registered hand-raising classmates and the perceived performance level of the class)?

We expected that the reported number of registered hand-raising classmates and the perceived performance level would fully mediate the relationships between the proportion of high-achieving classmates and participants’ self-concepts (Hypothesis 3). In line with Hypothesis 2, we expected this effect particularly for situational self-concept as an outcome.

Figure 5 provides an overview of the theoretical structural model.

Method

The study and data collection were approved by regional educational authorities and the ethics committee of the University of Tübingen, who confirmed that the procedures were in line with ethical standards of research on human subjects (date of approval: 11/25/2019, file number: A2.5.4-106_aa).

Sample

The recruited sample consisted of N = 381 students in Grade 6. Data from 28 participants had to be excluded due to technical issues during data collection (i.e., mostly the unexpected crashing of HMDs and computers or audio issues in the middle of the IVR experience). The final sample consisted of N = 353 students (MAge = 11.52 years, SDAge = 0.55; 46.7% girls) from a total of 25 classes at 12 different schools.

An a priori power analysis was computed to determine the required sample size by considering existing findings from experimental studies (Möller & Köller, 2001, Study 1: d = 0.85, Study 3: d = 1.37; Wolff et al., 2018, Study 1: d = 0.73). Considering the fact that the use of explicit performance feedback in these studies would presumably result in an overestimation of the effects that would be expected from varying hand-raising behavior in the present study, small to medium effects (f = 0.20) were assumed for the power analysis. The sample size that was necessary for each of the four conditions was determined to be N = 90 students for the respective analyses of variance (for two-tailed tests with a 0.05 alpha level and a minimum power of 0.90).

Procedure

Participating students were recruited from local secondary schools (so-called Gymnasium schools, which are attended by about 50% of students of this age in southern Germany) via e-mails and invitation letters. After obtaining written informed consent from both the students and their parents or legal guardians, all students who indicated interest were admitted into the study. Students were tested in groups of up to 10 with all test sessions taking place at the participants’ schools (see Fig. 1 for an impression).

Fig. 1
figure 1

Participating students tested in groups of up to 10 (pictures: Gabriele Loges). Note: Participants’ IVR systems were not linked; hence, each participant independently experienced the IVR classroom situation surrounded by the simulated virtual peer learners

For each of the ten students in a group, head-mounted displays (HMDs; HTC Vive Pro Eye) were set up prior to the test session. The set-up included selecting one of the experimental conditions, whereby random numbers were generated at an individual level and used to randomly distribute the experimental conditions within and across the test groups. Students were free to choose any seat when they entered the testing room without knowing about the different experimental conditions. The experimental conditions differed only with respect to the specific characteristics that were manipulated in the virtual classroom situation.

In total, each test session took approximately 45 min and consisted of three main parts: First, participants completed a paper-based pretest, which included demographics, relevant personality characteristics, and students’ learning backgrounds. Second, participants experienced the IVR lesson, which was introduced as a learning experience without any reference to possible social comparisons with virtual classmates. Importantly, even though participating students in a testing group experienced the IVR lesson at the same time, their IVR systems were not linked in any way, and participants were clearly instructed that once they put on the HMDs, each of them was going to be in their own IVR classroom, independent of their classmates in the real world. Third, as soon as participants finished the virtual learning experience, they completed a paper-based posttest questionnaire followed by a debriefing. The posttest included scales for measuring participants’ self-concepts and their experience with the IVR lesson (i.e., perceptions of their virtual classmates and the virtual class’ performance level).

The immersive VR classroom

Lesson content and design of the IVR classroom

The IVR lesson’s content was adapted from tested and evaluated materials from a course that was designed to teach kids basic computational thinking skills (titled “Understanding how computers think”). Computational thinking generally describes the ability to sequence a problem or task into substeps, to formulate solution steps, and to use a computer for this purpose (Weintrop et al., 2016). In the 15-min IVR classroom experience, the students learn about the meaning of coding and sequences and loops as basic computational concepts. At the beginning of the lesson, the teacher gives a short introduction to the topic and asks a number of open questions (e.g., “Who can explain what coding means?”; “Who has heard the word sequence before?”; “Who knows an example of something in everyday life that works like a loop?”). The questions are followed by two exercises that test the use of the concepts “sequence” and “loop” (adapted from the Computational Thinking test by Román-González et al., 2017). The students have to choose the correct answer from four options, and after each task, the teacher checks the class’ performance by going through the answer options and asking who thinks each of the options is the correct answer. Eventually, the IVR lesson concludes with a brief summary by the teacher. We chose the specific classroom scenario, including elements of formative assessment and a quiz for the IVR simulation, to allow a targeted experimental manipulation of the class’ performance level (and an examination of corresponding effects) in the available time while ensuring that classroom events would appear natural and authentic to the participating students.

There are currently only a few concepts that are usually taught to implement computational thinking in the curricula of primary and lower secondary schools. Against this background, computational thinking was considered particularly suitable for the purpose of the study, as it could be assumed that the participating students had little to no prior knowledge and corresponding learning experiences with this subject matter. Consequently, social comparison effects could be investigated largely independently of students’ previous experiences, thus offering a way to look at the genesis of differences in self-concept in the field of computational thinking. In this vein, we also designed the IVR classroom (see Fig. 2) to look like a “blank slate”, resembling a classroom on the first day in a new school and class without any posters or student projects on the wall that could potentially influence students’ learning experiences in the IVR classroom (see for respective findings, e.g., Cheryan et al., 2011).

Fig. 2
figure 2

Virtual classroom situation with different proportions of hand-raising classmates. Note: The top image shows a situation with 20% of the virtual classmates raising their hands, whereas the bottom image reflects the 80% condition. For a preview stream of the IVR lesson, see https://doi.org/10.17605/OSF.IO/JB8VQ

The whole IVR learning experience was situated in a simulated classroom (see Fig. 2). The IVR lesson was a fully preprogrammed simulation of a typical teaching situation based on audio recordings and motion captures stemming from a real classroom to ensure that the pace and content of the virtual classmates’ answers as well as their movements would be calibrated to reflect those typical of a sixth grader. Graphical representations of the virtual classmates and the teacher were designed by considering the Uncanny Valley effect (Mori et al., 2012) and aiming to capture an appropriate degree of (behavioral) realism (Bailenson et al., 2004; Guadagno et al., 2007). Participants experienced the classroom situation from the perspective of a student sitting in the virtual classroom surrounded by 24 virtual classmates. Participants were asked not to walk around in the virtual classroom and to remain seated and quiet, but they could engage in any other activities to explore the virtual environment (e.g., look around, raise their hands), and they were instructed to behave as they would in a normal classroom situation.

Experimental manipulation of virtual classmates’ hand-raising behavior

When the virtual teacher interacted with the virtual classmates, the experimental groups differed in terms of the proportion of classmates who raised their hands in response to the teacher’s questions or who indicated that they knew the correct solution to a task. After each question, the virtual teacher called on one of the hand-raising classmates. To ensure that participating students would clearly associate their classmates’ hand-raising with the classmates’ knowledge of the correct answer, the hand-raising classmates’ answers were always correct, and the teacher always acknowledged the correctness while engaging in a continuous dialogue with the virtual class. The proportion of virtual classmates who responded to the teacher’s questions and showed high-performing participation in the lesson was manipulated on four levels with 20% versus 35% versus 65% versus 80% hand-raising classmates (see Fig. 2 for an impression). We chose these four levels to ensure that we had (a) an effective study design with a limited number of conditions and enough participants to have sufficient power and (b) a differentiated picture of when aversive versus positive effects appeared. Hence, there was a relatively fine-grained difference between 20 and 35% as well as between 65 and 80%, whereas there was a larger difference between 35 and 65% to ensure differentiated grading and yet unambiguous information about whether the percentage of classmates who were high-achieving was below versus above average. Participants were randomly assigned to one of the experimental conditions.Footnote 1

Participants’ experience in the IVR classroom

The IVR environment was designed to ensure an authentic classroom experience for participants. We checked for whether participants perceived their experience in the IVR classroom as authentic via self-reports. Therefore, we assessed participants’ level of experienced presence in the IVR classroom with nine items (e.g., “I felt like I was sitting in the virtual classroom” or “I felt like the teacher in the virtual classroom really addressed me”) based on common conceptualizations of spatial and social presence (Lombard et al., 2009; Schubert et al., 2001). Moreover, we asked participants to rate the degree of realism of the IVR lesson with six items (e.g., “What I experienced in the virtual classroom could also happen in a real classroom” or “The behavior of the students in the virtual classroom was similar to how real classmates behave”). Both variables were rated on a 4-point rating scale ranging from 1 (not true at all) to 4 (absolutely true) and had acceptable Cronbach’s alpha values of 0.77 and 0.78, respectively.

The self-reports for experienced presence and perceived realism indicated high levels of experienced presence and perceived realism in the IVR environment across all configuration conditions: The reported mean levels of experienced presence and perceived realism ranged from 2.82 to 2.97 (0.52 ≤ SDs ≤ 0.62) in all experimental conditions. The experimental conditions had no statistically significant effects on participants’ experienced level of presence and their perceived realism of the IVR classroom (all ps > 0.05).

Measures

The results reported in this paper are part of a large IVR classroom study that was designed to answer several different research questions. Therefore, the questionnaires administered in the pretest and posttest included several variables that were not relevant for the present study, specifically the central personality characteristics of participating students (i.e., self-acceptance, self-concept of intelligence, social self-concept, social orientation, and learning and performance goal orientation), students’ experiences in the IVR classroom (i.e., perceived realism, experienced presence; see relevant details in the description of the immersive VR classroom), students’ interest in the IVR lesson reported after they participated, as well as a knowledge test covering the central content of the IVR lesson. In the following, we provide details for all measures that were used to answer the research questions in the present study.

Perceived performance level

The posttest questionnaire administered after the IVR experience included two measures of participants’ perceptions of their classmates’ performance-related behavior. First, participants had to indicate how many classmates responded to the teacher’s questions and raised their hands to indicate that they knew the correct answers. The reported number of registered hand-raising classmates was assessed with a multiple-choice question in which participants were presented with the seating plan of the IVR classroom and had to mark all the classmates who raised their hands (see Appendix for the response format).

Second, participants were asked to rate the perceived performance level of the virtual class via five items. These items reflected the systematically varied characteristics of the IVR classrooms and were developed specifically to assess students’ perceptions of the IVR classroom (see Appendix). The five items were rated on a 4-point rating scale ranging from 1 (not true at all) to 4 (absolutely true), and the scale had a Cronbach’s α value of 0.87.

Self-concept

Participants’ self-concepts in the domain of computational thinking were assessed at the beginning of the posttest questionnaire. In order to obtain a differentiated picture of effects on participants’ self-concepts, we administered two distinct self-concept scales with four and six items each (based on the commonly used wording by Schwanzer et al., 2005, which we adapted for situation- and domain-specificity; see Appendix). The first scale assessed situational self-concept, tailored to the situation in the IVR classroom and focusing in particular on social comparisons with the virtual classmates. The four-item scale consisted of four items of which two were reverse-scored and recoded accordingly. The second self-concept scale measured dispositional self-concept in the domain of computational thinking with six items. The items covered the core competencies associated with computational thinking (Grover & Pea, 2013; Román-González et al., 2017; Weintrop et al., 2016). Three of the items on the scale were reverse-scored and recoded accordingly. Both self-concept scales used a 4-point rating scale ranging from 1 (not true at all) to 4 (absolutely true), yielding acceptable Cronbach’s α coefficients of 0.71 and 0.73, respectively.

Covariates

The pretest questionnaire included demographics (e.g., age, gender) and questions about participants’ prior computational thinking and IVR experience. Whereas the BFLPE has been argued to be generalizable across diverse student and contextual characteristics (Marsh & Seaton, 2015; Marsh et al., 2021), specifically gender (Plieninger & Dickhäuser, 2013; Preckel & Brüll, 2008; Thijs et al., 2010) and individual achievement (Huguet et al., 2009; Marsh et al., 2014; Trautwein et al., 2009) have repeatedly been discussed to be determinants of students’ self-concept. Due to the experimental design, the covariates played a minor role in our study. However, despite the large sample, we could not be completely sure that our randomization was 100% effective. Therefore, we included participants’ gender and a proxy for their individual achievement in the IVR lesson (i.e., participants’ prior experience with the IVR lesson topic and their prior use of IVR technology in general) as covariates in our analyses to account for their potential confounding effects on students’ learning experiences, specifically their self-concepts, in the IVR classroom. Participants’ prior experience with computational thinking was assessed with a dichotomous variable (i.e., “Have you ever attended a course on programming?”). Participants had to answer yes or no. Prior experience with IVR was measured with one question asking participants whether they had ever been in an IVR environment (with a head-mounted display) before. Participants had to answer 0 (never), 1 (once), or 2 (more than once).

Statistical analyses

Hand-raising behavior as the experimental manipulation on four levels was included in all regression analyses via three dummy variables. We dummy-coded the hand-raising conditions to gain insights into which experimental conditions differed from each other and how; more specifically, we opted for paired comparisons against one baseline condition (Condition 1: 20% hand-raising) to address our hypotheses suggesting that with increasing proportions of hand-raising classmates (Conditions 2 to 4), the effects on our outcome variables would become more pronounced compared with the baseline condition. All regression models were calculated with the full sample and hence included three pairwise comparisons of the dummy-coded hand-raising conditions against the baseline condition of 20% hand-raising classmates.

As the data were collected at the participants’ schools, for practical reasons, the testing groups always consisted of students who belonged to the same school. The experimental conditions were randomly assigned to participants on an individual level across the testing groups. To account for the nested data structure, we controlled for cluster effects by using a school variable in all analyses (number of clusters N = 12). All models were estimated in Mplus 8.2, and full information maximum likelihood estimation was used to deal with missing values (Muthén & Muthén, 1998–2017).

To examine whether classmates’ different hand-raising behaviors predicted the perceived performance level of the class and had an effect on students’ self-concepts, we computed multiple linear regression analyses with the experimental hand-raising conditions as the independent variable and the perceived performance level and the self-concept measures as the dependent variables. We followed the suggested procedure for a regression-based approach to mediation analysis by Hayes (2017). That is, given significant direct effects on the self-concept measures, we included the perceived performance level in the regression model as an additional predictor (in addition to the dummy-coded hand-raising conditions) to test whether it mediated the effect of hand-raising behavior on the self-concept measures as the dependent variables. A mediation would mean that the mediating variable yielded a statistically significant effect on the dependent variable, whereas the direct effect of the hand-raising conditions on the dependent variable was reduced in size and turned statistically nonsignificant (Hayes, 2017). Given this pattern of results, we calculated and tested the indirect (i.e., mediation) effect by multiplying the regression weight that referred to the association between the experimental conditions and the perceived performance level times the association between the perceived performance level and students’ self-concept, while controlling for differences between the experimental conditions. As the bootstrapping procedure has been shown to be able to obtain statistical power comparable to that of other procedures (Hayes & Scharkow, 2013), we used 10,000 bootstrapped samples and 95% confidence intervals in order to account for a potentially asymmetrical distribution of the indirect effect (MacKinnon et al., 2002; Preacher & Kelley, 2011). Finally, following suggestions by Mayer et al. (2014), who argued that potential confounds should also be considered in strictly randomized research designs, we reran all the regression models a second time, additionally including relevant background variables (i.e., participants’ gender, prior experience with the lesson topic, and prior use of IVR technology) as covariates. To avoid overloading the manuscript, the paper only reports results from the regression models without the covariates as additional parameters; detailed results of the robustness check are provided in the online supplement.Footnote 2

Based on strong theoretical and empirical evidence, all hypotheses for the perception of the manipulated hand-raising behavior (i.e., number of registered hand-raising students and perceived performance level of the class) and regarding situational self-concept were directional and were thus tested with one-tailed tests. The remaining exploratory hypotheses on the effects on dispositional self-concept were tested with two-tailed tests. We used a critical p-value and confidence intervals set at an alpha level of 0.05 for all hypothesis tests, and we report standardized regression coefficients for all regression analyses. To account for the fact that we computed a number of significance tests, we applied the Benjamini–Hochberg correction (Benjamini & Hochberg, 1995, 2000) to avoid the accumulation of Type I errors. As we report standardized regression coefficients, these can be interpreted as effect sizes. We additionally calculated Cohen’s d for standardized mean differences in dummy-coded categorical variables (Cohen, 1988).

Results

Table 1 presents descriptive statistics for the basic sample characteristic and covariates after participants were randomized to one of the hand-raising conditions. There were no significant between-group differences with respect to the basic sample characteristic and covariates measured before the students experienced the IVR classroom situation.

Table 1 Descriptive sample statistics after randomization to hand-raising conditions

Do different variations of hand-raising behavior lead to distinct perceptions of the class’ performance level? (RQ 1)

We first tested whether students (a) recognized and recalled their classmates’ hand-raising behavior and (b) obtained different perceptions of the class’ performance level on the basis of the manipulated hand-raising behavior. As expected, descriptive statistics (see Fig. 3) showed continuously increasing mean values for the reported number of registered hand-raising classmates from 20 to 80% hand-raising classmates.

Fig. 3
figure 3

Numbers of (registered) hand-raising classmates in the hand-raising conditions. Note: The possible answer range for the registered hand-raising classmates was between 0 and 24. The number of actual hand-raising classmates indicates the true value of the number of classmates who raised their hands in the respective condition

Results indicated that the experimental hand-raising conditions significantly predicted the number of registered hand-raising classmates. There was no significant effect of 35% compared with 20% hand-raising classmates (β = 0.08, SE =  0.05, p = 0.069), but 65% (β = 0.28, SE = 0.07, p < 0.001, d = 0.79) as well as 80% (β = 0.58, SE = 0.05, p < 0.001, d = 1.48) hand-raising classmates led to significantly higher numbers of registered hand-raising classmates compared with the 20% condition.

Similarly, the perceived performance level of the class was positively predicted by the proportion of hand-raising classmates (see the descriptive statistics in Fig. 4). Thirty-five percent compared with 20% hand-raising classmates did not result in significantly different perceptions of the class’ performance level (β = 0.06, SE = ‍ 0.05, p = 0.099). ‍However, compared with 20% hand-raising classmates, the perceived performance level was significantly higher for 65% (β = 0.25, SE = 0.04, p < 0.001, d = 0.60) as well for 80% (β = 0.45, SE = 0.07, p < 0.001, d = 1.24) hand-raising classmates.

Fig. 4
figure 4

Perceived performance level of the class in the hand-raising conditions

There was a significant positive correlation between the reported number of registered hand-raising classmates and the perceived performance level of the class, r(333) = 0.30, p < 0.001.

Do the variations in hand-raising behavior affect participants’ self-concepts? (RQ 2)

The second research question asked how the virtual classmates’ hand-raising behavior was related to participants’ situational and dispositional domain-specific self-concepts. As expected, descriptive statistics (see Table 2) showed that the mean values for situational self-concept in the different conditions continuously decreased as the proportion of hand-raising classmates increased from 20 to 80%.

Table 2 Descriptive statistics for the mean self-concept values in the hand-raising conditions

As Table 3 shows, the two conditions with 35 and 65% hand-raising classmates did not show any statistically significant differences in situational self-concept in comparison with the condition with 20% hand-raising classmates (β = −0.02, SE = 0.06, p = 0.390 and β = −0.04, SE = 0.06, p = 0.252 for 35 and 65%, respectively). However, the condition with 80% hand-raising classmates had a statistically significant negative effect on the mean level of situational self-concept compared with the 20% condition (β = −0.12, SE = 0.05, p = 0.015, d = 0.30), indicating that a higher number of hand-raising classmates led participating students to evaluate their own abilities in the IVR lesson as worse. In contrast to this, the hand-raising conditions were not associated with significant differences in dispositional self-concept (β = −0.01, SE = 0.07, p = 0.894 for 80% vs. 20% hand-raising classmates).

Table 3 Standardized regression coefficients for effects of the different hand-raising conditions on situational and dispositional self-concepts when including the mediator

Do different perceptions of the class’ performance level explain effects on participants’ self-concepts? (RQ 3)

The perceived performance level was significantly negatively related to situational self-concept (β = −0.19, SE = 0.05, p < 0.001), showing that students who perceived the class’ performance level as higher reported lower levels of situational self-concept. However, the perceived performance level of the class did not lead to any significant differences in dispositional self-concept (β = −0.03, SE = 0.04, p = 0.426). The reported number of registered hand-raising classmates did not predict any significant differences in situational self-concept (β = -0.05, SE = 0.04, p = 0.122) or dispositional self-concept (β = 0.06, SE = 0.07, p = 0.423).

As can be seen in Table 3, including the perceived performance level in the regression model (Model 3) revealed a significant negative association between the perceived performance level and situational self-concept (β = −0.18, SE = 0.04, p < 0.001). In addition, including the perceived performance level substantially reduced the direct effect of the manipulation (particularly the effect of 80% vs. 20% hand-raising classmates, see Model 1 for comparison) on situational self-concept so that it became statistically nonsignificant (β = −0.04, SE = 0.06, p = 0.267). Comparing Model 3 against Model 1 thereby suggests that the perceived performance level fully mediated the relationship between the different hand-raising behaviors of virtual classmates (80% vs. 20%) and situational self-concept. The indirect effect on situational self-concept calculated with 10,000 bootstrapped samples for the pairwise comparison of the 80% vs. 20% hand-raising condition and the perceived performance level as a mediator was small but statistically significant (β = −0.05, SE = 0.02, p < 0.001, 95% CI [−0.08, -0.02]).

As a robustness check (see suggestions by Mayer et al., 2014), we finally added some background variables (participants’ gender, prior experience with the lesson topic or IVR) to the regression models. Adding the covariates did not change the size or statistical significance of the reported effects in any of the models. The effects of the hand-raising conditions on the number of registered hand-raising classmates and on the perceived performance level (RQ 1) as well as the effect of the different hand-raising behaviors (80% vs. 20%) on situational self-concept (RQ 2) and the mediating effect of the perceived performance level (RQ 3) remained statistically significant when the background variables were added to the model. Detailed statistics for all the regression models including background variables are provided in the online supplement.

Discussion

The present study examined how individual differences in students’ academic self-concepts emerge and how these differences can be explained by students’ perceptions of implicit performance-related information in their classmates’ behavior (i.e., hand-raising). By systematically examining the BFLPE on the basis of authentic classroom behavior, the present study aimed to provide causal evidence that social comparison processes in classrooms are the underlying mechanism that leads to differential effects on students’ academic self-concept. The study used an IVR classroom with a systematic variation of classmates’ performance-related (i.e., hand-raising) behavior. Results provided support for the three hypotheses. Figure 5 shows a summary of the examined and revealed effects.

Fig. 5
figure 5

Summary of the revealed effects in one structural model. Note: A summary of all examined hypothesized relationships from the different statistical models is depicted. Bold arrows indicate statistically significant relationships, dashed arrows indicate relationships that were explored in addition to the hypotheses

First, the study showed that, as expected, classmates’ hand-raising behavior positively predicted (a) the number of hand-raising classmates that participants registered and recalled as well as (b) the perceived performance level of the class (Hypothesis 1). Second, the results provided support for the hypothesis that classmates’ hand-raising negatively affected participants’ situational self-concept (Hypothesis 2). More specifically, the expected negative effect occurred only for situational self-concept and between the extreme conditions (i.e., 20% vs. 80% hand-raising) but not for dispositional self-concept or for more moderate standards of comparison. The effect was rather small yet fully in line with predictions that were based on the BFLPE. Lastly, whereas the reported number of registered hand-raising classmates had no significant (indirect) effect on situational self-concept, results supported our expectation that the perceived performance level would mediate the effect of classmates’ hand-raising behavior on students’ situational self-concept (Hypothesis 3): The more classmates raised their hands, the higher the perceived performance level of the class, and this perception in turn negatively predicted students’ situational self-concept.

Corroborating evidence for the BFLPE from an authentic experimental design

A large number of studies have provided evidence that comparisons with higher achieving peers lead to negative effects on students’ self-concept when individual achievement is controlled for—a finding prominently known as the BFLPE (Marsh & Hau, 2003; Marsh et al., 2021; Nagengast & Marsh, 2012; Seaton et al., 2009, 2010). However, an explicit investigation of the direct role of social comparison processes for the BFLPE has been missing so far because large-scale research in school settings makes truly randomized designs impractical and ethically difficult to realize. Whereas previous BFLPE studies have thus mostly relied on descriptive and correlational approaches, existing experimental social comparison studies do allow for causal conclusions but typically cannot reflect the complexity of social comparisons in real-world settings (i.e., implicitly provided rather than explicitly given social comparison information that needs to be discovered by students). By using an IVR classroom, the present study specifically contributes to research on the BFLPE as it used actual classroom behavior as social comparison information to investigate social comparison processes and associated effects among students in an authentic yet controllable way.

This new approach yielded a number of relevant findings. Perhaps most importantly, the study attests to the ubiquity of social comparison processes in students. Researchers have long argued that social comparison processes are highly pertinent in everyday life (e.g., Buunk & Gibbons, 2007; Buunk & Mussweiler, 2001; Festinger, 1954). However, in educational contexts, there is still a question about the relative importance of internally processed, implicit social comparison information (e.g., the observable academic behavior of fellow students) versus explicitly presented social comparison information (e.g., direct feedback from teachers and school grades; Lüdtke et al., 2005; Trautwein & Lüdtke, 2005). The present study provides evidence of social comparison processes among students on the basis of social comparison information that had to be derived from classmates’ performance-related behavior. In other words, participants in the study did not receive explicit feedback on their own and the class’ performance level but had to infer it from their classmates’ behavior (i.e., hand-raising associated with correct responses to the teacher’s questions). In fact, our results showed that the students noticed what was happening in the classroom and that virtual classmates’ hand-raising behavior was strongly associated with participants’ perceptions of the class’ performance in the IVR classroom. We found that the experimental variations in classmates’ hand-raising behavior corresponded to (a) the reported number of registered hand-raising classmates and (b) the perceived performance level of the class. However, only the perceived performance level was found to predict differences in situational self-concept. In other words, solely recognizing and remembering that larger or smaller numbers of classmates showed a certain performance-related behavior did not affect participants’ self-evaluations, but what mattered was the respective perception of this behavior as a performance indicator. Similar to the argument made by Huguet et al. (2009), this finding suggests that students had to actively process their classmates’ performance-related behavior (i.e., derive the available social comparison information from it), and these perceptions ultimately explained the effects on their self-evaluations.

Second, we found experimental support for the BFLPE. Whereas Dai and Rinn (2008) called into question whether social-contextual influences are the major reason for the BFLPE, the present study showed that by varying only classmates’ behavior, the typical BFLPE results could be reproduced, and when classmates were higher performing, students’ situational self-concept was lower. Importantly, in light of the minimal intervention that led to the effect (only a 15-min exposure to different peers’ hand-raising behavior), we consider the small effect meaningful, particularly against the background that many intervention studies report small effect sizes for effects on students’ self-concepts (O’Mara et al., 2006).

Third, we found support for the underlying social comparison mechanism as assumed in BFLPE research. The experimental manipulation of classmates’ hand-raising behavior significantly impacted students’ perceptions of the class’ overall performance level, and these perceptions in turn predicted differences in situational self-concept. Specifically, the direct effect of the hand-raising conditions on students’ situational self-concept turned nonsignificant when the perceived performance level was added to the regression model, suggesting that students’ perceptions of the class’ performance level fully explained the effect of classmates’ hand-raising behavior on students’ situational self-concept. The overall effects were rather small, but they allow important insights into the mechanisms that underlie the BFLPE, highlighting the role of individual perceptions of the class’ performance level as determinants of students’ situational self-concept. In doing so, the study presents an important step in the direction suggested by Collins (2000), who emphasized the need for more naturalistic studies that can account for individually shaped perceptions of social comparison information. Moreover, the results showed how easily the social environment impacts students’ (self-)evaluations as the different perceptions of the class’ performance level and respective effects on situational self-concept were based solely on a manipulation of hand-raising behavior and occurred after only 15 min of experiencing the classroom situation.

Notably, as an additional important finding, we found effects only on situational self-concept and between the extreme standards of comparison (i.e., 80% vs. 20% high-performing classmates), whereas dispositional self-concept in the domain of computational thinking was not affected. The lack of an effect on dispositional self-concept is not surprising considering the results of a recent meta-analysis (O’Mara et al., 2006) that suggested that effects of self-concept interventions are mostly observable on a domain- or situation-specific level rather than with respect to general self-evaluations. In fact, considering the importance of academic self-concept and its long-term effects for individual academic trajectories (Nagengast & Marsh, 2012; Valentine et al., 2004), it would not have been desirable for a 15-min classroom experience to have an observable effect on dispositional self-concept in a specific domain. However, on the basis of the notion of a multifaceted self-concept and the assumption that more long-term and enduring self-perceptions are substantially shaped by single situations (Harter, 1986; Marsh & Shavelson, 1985; Suls & Mullen, 1982; Wrzus & Roberts, 2017), it can be assumed that repeatedly evoking effects on situational self-concept will eventually lead to effects on dispositional self-concept. In other words, the present study enabled us to determine how differences in students’ situational self-concept emerge as a result of social comparisons in a classroom situation. The observed differences might result in more stable differences in self-concept if students repeatedly experience classroom situations that resemble the ones used in the present experiment.

The main objectives of the present study were to experimentally test the BFLPE and to systematically investigate the corresponding social comparison processes in the classroom; consequently, we focused on the effects of the IVR classroom situation on students’ perceptions of their classmates and resulting differences in their self-concepts. Learning outcomes were not within the scope of the present study, however, considering the large body of research on students’ self-concept, we would like to emphasize that the educational ramifications of social comparisons in the classroom go beyond the effects on students’ self-concept and concern individual differences in achievement (Trautwein & Möller, 2016; Valentine et al., 2004), achievement emotions (Pekrun et al., 2019), as well as academic choices, and career aspirations (Göllner et al., 2018; Nagengast & Marsh, 2012). With this in mind, the findings of the present study also offer an outlook of how the affordances of IVR technology can be used not only for study purposes but also to purposefully utilize peer effects in the classroom to create more effective learning scenarios with simulated virtual classmates that benefit students’ self-concepts and ultimately improve their learning experience in general (see, e.g., Hasenbein et al., 2022).

Limitations and future directions

The present study made use of the potential of IVR as an experimental tool for investigating social psychological processes in a standardized yet authentic classroom situation. Ever since IVR was first proposed and tested (Bailenson et al., 2008; Blascovich et al., 2002), its potential for classroom research has rarely exploited (e.g., Blume et al., 2019; Kizilcec et al., 2015). As a result, a comprehensive evidence-based guideline for using IVR classrooms as an experimental tool is still lacking. We argue that this methodological advancement is promising but needs to happen step-by-step, grounded in systematic research that allows researchers to understand exactly how the potential of IVR can be tapped to improve classroom research. The present study provides initial insights that we see as the necessary groundwork for many further studies that will examine how IVR classrooms can be used to gain more in-depth systematic insights not only into social comparisons but also into other classroom processes. In the following, we highlight three aspects of the present IVR setting that are important to consider when drawing conclusions from its results and when aiming to replicate and extend the present study’s findings, namely, the duration of the experiment, the central manipulation of performance-related behavior, and the content of the IVR lesson.

First, we used recordings and motion captures stemming from a real sixth-grade classroom to ensure that our IVR simulation would reflect an authentic classroom experience for the participating sixth graders. However, for practical as well as economic reasons, the IVR lesson in the present study lasted for only 15 min and therefore represents only a snippet of the reality of a real-world classroom. As students’ self-reports indicated, they perceived the IVR lesson, including the events and people in it, as realistic and judged it as similar to what they have experienced in real-world classrooms. The results of the present study suggest that 15 min were sufficient for participants to engage in the IVR scenario and to recognize all relevant cues to obtain the desired impression of the classroom situation. Nevertheless, the limited duration of the IVR lesson had two important consequences that need to be discussed, concerning (a) the experimental design and (b) the conclusions that can be drawn from the study’s findings. Regarding the first aspect, in order to ensure that our experimental manipulation was as unambiguous as possible (i.e., that participants clearly associated their classmates’ hand-raising with the classmates’ knowledge of the correct answer), we had virtual classmates respond only with correct ideas when they raised their hands and were called on by the teacher. A longer IVR lesson would give researchers more options in this regard but would simultaneously have other consequences of a technical and content-related nature; for example, modeling question–answer sequences in the form of a Socratic dialogue between the teacher and the class would emphasize learning effects in an IVR classroom much more, providing promising avenues for future research that is more focused on learning outcomes, which were not within the scope of the present study. Regarding the second aspect on interpreting the study results, we found effects on situational self-concept but not on dispositional self-concept. Future research should investigate this distinction more closely in longer and repeated learning sequences. It would be interesting to see whether the effects can be observed for more general dispositional self-concept after longer or repeated experiences in comparison with the present study’s IVR classroom.

Second, the only indicator of performance used and manipulated in the present study was the virtual classmates’ hand-raising behavior, which occurred 13 times throughout the IVR lesson. To ensure that hand-raising as the experimental manipulation was unambiguously associated with classmates’ performance, hand-raising was always coupled with correct responses to the teacher’s questions once a hand-raising classmate was called on, and we varied only the proportion of hand-raising classmates per condition. In fact, we found that participants recognized virtual classmates’ hand-raising behavior as an indicator of the class’ performance level and used it as social comparison information, resulting in differences in participants’ situational self-concept. However, the effect of hand-raising behavior on situational self-concept was rather small and occurred only when we compared the “extreme” hand-raising conditions (80% vs. 20%). Notably, even though participants perceived the IVR classroom setting as realistic and authentic in general, it is necessary to consider the fact that, in real classroom settings, there are many different sources of implicit but also more explicit social comparison information and a very salient evaluative atmosphere that is shaped by known peers, constant performance feedback, and events that go beyond isolated teacher-student interactions (Dijkstra et al., 2008; Levine, 1983; Wheeler & Suls, 2005). The present study examined classmates’ hand-raising as a performance indicator, but we argue that performance-related behaviors can be observed in many more ways than the hand-raising that was manipulated in the present study. However, the present study demonstrated the potential of IVR classrooms for experimental and yet authentic study designs that allow researchers to disentangle the numerous influences at play when students evaluate themselves in a classroom situation (i.a., in relation to their classmates). Against this background, future studies should additionally examine other behavioral cues (and their interplay) that additionally affect students’ perceptions of a class’ performance level and ultimately lead to differences in students’ self-concepts; for instance, the quality of classmates’ responses and the level of effort demonstrated by peers (see, e.g., Muenks et al., 2016).

Third, the IVR lesson was designed specifically for sixth-grade academic track students. We selected the learning material on computational thinking as a topic that is not included in the curriculum of academic track schools before Grade 7, and our findings consequently apply only to our sample of Grade 6 students, who we expected to be rather unacquainted with the subject matter. We checked for whether students had prior experience with the lesson topic (e.g., through extracurricular activities) before they participated in the experiment, but controlling for the fact that some students might have already known the answers to the teachers’ questions did not change the results. However, we would like to note that we only used prior experience with the lesson topic as a proxy and did not include a direct measure of how many of the teacher’s questions in the IVR lesson the students were actually able to answer. Whereas our randomized study design rules out systematic differences between the experimental conditions, future research should examine the effects of individual achievement on interpretations of classmates’ performance-related behavior more closely. For future research, it would additionally be worthwhile to replicate the present study with less experienced students from other class levels or school types or even with more demanding material in the IVR lesson to examine whether the observed effects would be more pronounced under these circumstances. With regard to the IVR lesson topic, we would like to point out that in STEM-related domains, such as the one used in the present study, gender differences in students’ self-reported self-concepts appear to be quite common (e.g., Friedrich et al., 2015; Frome & Eccles, 1998; Plieninger & Dickhäuser, 2013; Preckel & Brüll, 2008; Thijs et al., 2010; Tiedemann, 2000; Trautwein & Möller, 2016). We controlled for students’ gender in our robustness check and did not find an effect on the present study’s results. However, it seems worthwhile for future studies to further use the potential of IVR classrooms to systematically and yet authentically investigate the influence of individual moderating variables on the effects of classmates’ performance-related behavior on students’ self-concept (e.g., personality traits or achievement goals; Jonkmann et al., 2012; Wouters et al., 2013).

Conclusion

The present study used IVR as a novel approach for testing the BFLPE and investigating associated social comparison processes. The standardized yet authentic IVR setting allowed us to provide evidence for the causality of the BFPLE and yielded important insights into the mechanisms that underlie the effect. The results indicate how ubiquitous social comparisons in the classroom are and highlight the major role of students’ perceptions of their classmates when explaining differences in self-evaluations. Moreover, our findings showed how easy it is for the social environment to impact learners’ situational self-concept and thus emphasize the necessity to consider the situation-specificity of self-concept when examining effects on self-evaluations. Beyond these aspects, not only do the results of the present study provide new insights into the emergence of social comparison effects in the classroom, but they make a general contribution to the use of virtual reality in educational research. By replicating the empirically well-supported BFLPE, the results of the present study provide support for the feasibility and validity of conducting experimental studies in an IVR classroom and thus provide the grounds for establishing IVR as a promising tool for experimental studies in educational psychological research. Based on the study’s findings, several directions for future research are discussed to extend the use of IVR’s technical affordances to further examine classroom processes, such as social comparisons and beyond.