1 Introduction

Music has the power to change our mood in any direction and can transport us through our memories, making us relive experiences we thought we had forgotten. For those who play it, performing music can be a pleasurable experience that allows them to reach a state of flow. In this state, musicians cease to be self-conscious and enter into a total connection with their instrument, allowing them to reach their full performance potential (Chirico et al. 2015). However, performing music can become a source of fear, tension and discomfort if the performer suffers from performance anxiety (Lee et al. 2023).

Musical performance anxiety (MPA) can be defined as "the experience of persistent, distressing apprehension and/or actual impairment of performance ability in a public context to a degree that is unwarranted given the individual’s musical aptitude, training, and level of preparation" (Salmon 1990). Performance anxiety can be considered a social anxiety disorder. According to the Diagnostic and Statistical Manual of Mental Disorders (APA 2013), performance anxiety is characterised by a marked fear or anxiety in social situations where an individual is exposed to possible scrutiny by others, which is disproportionate to the actual threat posed by the social situation and the socio-cultural context. As noted by Osborne and Kirsner (2022), a crucial element in this definition is the emphasis on the role of the social context and perceived evaluation by others, a key aspect that can be seen as a as a primary factor in the development of destructive cognition and internal pressure.

MPA is one of the most common disorders among professional musicians. According to a review conducted by Fernholz et al. (2019), about one third of professional musicians consider MPA to be a serious problem and about sixty per cent of them have experienced some form of MPA at least once in their careers. Osborne and Kirsner (2022) divided the ways in which MPA can manifest into three dimensions: Increased physiological arousal, cognitions and behaviours. These dimensions interact and may reinforce each other over time. Musicians may experience increased autonomic arousal responses when they feel helpless and unable to predict or control future performance outcomes. In addition, musicians may hold negative beliefs about themselves as performers and about their social environment. These beliefs can reinforce each other and lead to a downward spiral of panic, worsening physiological symptoms and reinforcing beliefs of uncontrollability. In addition, the cognitive and physiological effects of MPA can have a significant impact on musicians’ behaviour. They may choose less challenging repertoire or avoid performing in order to reduce anxiety. Over time, this avoidance behaviour can lead to a lack of motivation to perform and a decline in performance skills.

There are numerous factors that can contribute to creating or increasing this type of anxiety, Papageorgi et al. (2007) grouping them into three categories. The first category includes intrinsic factors of the individual, such as age, gender, sensitivity to criticism or self-concept, as well as extrinsic and cognitive characteristics, such as the quality of previous similar experiences, cognitive style or outcome expectations. The second category refers to factors related to the difficulty of the task and the individual’s preparation. Finally, the third category focuses on the characteristics of the performance environment and how the musician perceives it.

Although there are various methods of treating anxiety caused by the factors included in the first two categories (Spahn 2015), most of these factors either cannot be changed or altering them could produce other adverse effects. However, it is possible to modify the characteristics of the environment to create spaces that are more comfortable and pleasant for musicians (Bissonnette et al. 2016). Of the various environmental factors that can potentially affect MPA, the presence of the audience is probably the most important.

Several studies have investigated the effect of audience presence on anxiety levels. A study of university students (Jackson and Latané 1981) confirmed that the nervousness and tension associated with the anticipation of performing in front of an audience increased as the size of the audience grew, but decreased as the number of performers increased. Similarly,  Hamann (1982) conducted a study comparing the anxiety levels of music students when performing in front of an audience versus performing in front of a tape recorder. Their results showed that anxiety levels were significantly higher when performing in front of an audience. In addition, LeBlanc et al. (1997) observed a similar pattern, finding that the presence of an audience also led to increased anxiety in high school students. In this context, virtual reality (VR) environments are presented as an easy and controlled way to conduct these types of experiments. Several studies provide evidence that performing in front of a virtual audience induces performance anxiety in a similar way to real-life environments.

Pertaub et al. (2002) compared people’s anxiety responses to an audience of 8 virtual humans in an immersive and a non-immersive environment. They examined how the audience’s attitude affected participants’ anxiety levels and found that a negative audience attitude significantly increased anxiety levels compared to a positive or neutral audience. Orman (2004) investigated the effect of the audience on the anxiety levels of saxophonists performing in a virtual environment. Although the sample size did not allow for reliable conclusions, the results of the experiment suggested that spaces with an audience were more anxiety-provoking than empty spaces, and there was also evidence that spaces where the virtual audience was a jury were more anxiety-provoking than those where the audience consisted of classmates. In a similar experiment (Bissonnette et al. 2016), a virtual audience was used as an exposure therapy for MPA. Different audience sizes were used to create different scenes. Although the experimental design found no significant differences between audience sizes in musicians’ anxiety levels, a decrease in anxiety levels was found between different sessions. In the same vein, Mostajeran et al. (2020) investigated the effect of audience size on anxiety using a virtual Trier Social Stress Test. They compared three audience sizes: 3, 6 and 15. Although an increase in the physiological anxiety response was observed in virtual reality, it was lower than in live experiments. The effect of virtual audience size was only observed on heart rate, which was higher in the 3-person virtual audience experiment.

All of the above studies, in both real and virtual environments, compare the effect of either an empty room versus a room with an audience, or between different audience levels. In practice, however, it is often not possible to control the size of the audience, since a reduction in the audience would lead to a reduction in revenue. In this sense, recent findings in the field of neuroarchitecture offer a possible solution to this problem by suggesting that different variables of the built environment can significantly alter the perception of spaces (Karakas and Yildiz 2020). For example, the aspect ratio of a room’s floor plan can affect the perception of space by making two rooms of the same size appear to be different sizes (Sadalla and Oxley 1984), or the colour of lighting and the aspect ratio of windows can alter people’s subjective thermal perception (Vittori et al. 2021). A study conducted with university students (Fich et al. 2014) showed that architectural design can alter the response to psychological stress. The study used a virtual version of the Trier Social Stress Test to determine whether students experienced different levels of stress in different versions of a room. The results of the study suggest that in the more enclosed spaces, where the audience was almost the only visual stimulus, participants showed greater cortisol reactivity to stress induction. However, there are limitations to the neuroarchitectural experiments, both in terms of sample size and forms of assessment, which make the results not generalisable and call for further research (Assem et al. 2023).

A building is the result of a unique combination of thousands of variables, and the effect it has on each person can vary depending on many factors, such as their age or the activity they perform (Giuliani and Scopelliti 2009). For this reason, it is important to evaluate designs from the user’s perspective. Traditionally, this type of evaluation has been carried out after the building has been constructed, which is time consuming and costly if improvements are to be made based on the evaluation. To address this limitation, the concept of pre-occupancy evaluation (PrOE) is emerging; by evaluating the building at the design stage, it is possible to make changes to the design based on the outcome of the evaluation with minimal impact on the cost of the project (Shen et al. 2013). Virtual reality (VR) technologies can be of great use in conducting a PrOE, as they allow the future users of a building to immerse themselves in a virtual replica and see the different features of the built environment in a similar way to the real world; and in this way it is possible to get quick feedback on the architectural design (Tseng et al. 2017). Moreover, it is not even necessary to create a specific model to perform a PrOE in VR; the geometry of a Building Information Modelling (BIM) model can be exported to a game engine and rendered with a high level of realism (Gómez-Sirvent et al. 2023).

Returning to the subject of MPA, the most desirable scenario with this pathology is that those who suffer from it manage to overcome or mitigate it so that they can perform comfortably in any environment, but this is not always achieved, leading some musicians to consider abandoning their musical studies (Hernández et al. 2018). This begs the question: if the environment affects MPA, and there are easy ways to study its effects through VR, why not design spaces where musicians feel less exposed?

To the best of our knowledge, the possibility of altering the effect of the audience on the musicians through changes in the design of the environment, and without having to reduce the number of spectators, has not been explored before. In this study, we investigate how the intensity of the ambient lighting and the distance to the audience can influence the way musicians perceive the audience and the environment in general. For this purpose, an auditorium with a capacity of more than 800 people was modelled using BIM software and filled with animated virtual humans to simulate the presence of the audience. Using a head-mounted display (HMD), the musicians were shown the virtual auditorium and asked to play their instruments at different levels of ambient lighting and distance from the audience. To assess the impact of these variables, the participants’ eye movements were recorded while they played their instruments, and questionnaires were used to collect information on how they perceived the environment in the different conditions. The results of this study may help to understand the factors that influence the experience of musicians during their performances and may serve as a basis for future research.

2 Materials and methods

2.1 Virtual environment

The environment used for the experiment was a virtual auditorium with 807 seats. A simplified floor plan is shown in Fig. 1. The auditorium has an area of approximately \(1080 \,\hbox {m}^{2}\), divided into a stalls of \(650 \,\hbox {m}^{2}\), a stage of \(130\, \hbox {m}^{2}\), and \(300 \,\hbox {m}^{2}\) of amphitheatre and boxes. The design of the auditorium was done in Autodesk Revit. Taking into account the objective of the experiment, only the architectural elements visible from the stage were modelled. Once the design was defined, the geometry of the BIM model was exported to Unreal Engine, where the model’s original materials were replaced with more realistic looking materials from the Twinmotion Materials Pack collection. Virtual people were also added to simulate the presence of the audience and different levels of lighting were simulated.

Fig. 1
figure 1

Simplified floor plan of the auditorium (red crosses represent the possible positions where the musician can be placed)

The virtual humans used were avatars from the Microsoft Rocketbox library (Gonzalez-Franco et al. 2020), which contains 115 animated characters created from real people scanned in 3D. In our experiment, however, some of the characters were discarded if they wore conspicuous uniforms, such as firefighters, police or military personnel. To fill the 807 seats, several copies of each avatar had to be randomly distributed. For the avatar animations, only those in which the character remained seated were used. The animations were randomly distributed among the different avatars, and the order in which they were played was also randomised to avoid multiple avatars playing the same animation at the same time. All avatars can play any animation at any time, but the number of avatars that can play an animation at any one time was limited to twenty-five per cent of the total to reduce rendering overhead and to avoid excessive movement of the audience. Figure 2 shows a close-up of the characters in two different frames.

Fig. 2
figure 2

View of the seated avatars at two different times. Left: At the beginning of the scene. Right: After 30 s

The avatars have low-density meshes, less than 10000 triangles. However, these meshes were too dense for our experiment, as we had a large number of avatars placed at least 6 ms away from the musician. For this reason, we used different levels of detail (LODs), which in our opinion are imperceptible from the stage, to optimise the rendering. At LOD 0, the number of triangles was reduced by 75%; this level was used for avatars sitting in the first two rows. The next 7 rows were reduced by 87.5%. Finally, in the remaining rows, the reduction went up to 97%. The virtual human seen in the foreground of Fig. 2 was sitting in the first row and is therefore at LOD 0.

2.2 Variables of the experiment

As seen in the introduction, the presence of an audience is a stimulus that can aggravate MPA symptoms. In this paper, we investigated two ways to reduce the power of this stimulus without reducing the number of audience members: (1) reducing the visibility of the audience by lowering the ambient light and (2) increasing the distance between the musician and the audience.

With regard to the ambient lighting, three qualitative levels were defined in which the lighting above the audience was varied while the lighting above the stage remained constant:

  • High: The audience was perfectly visible from the stage.

  • Medium: The audience was visible, but their features and expressions were almost indistinguishable.

  • Low: Only a few shadows were distinguishable.

Figure 3 shows the different illumination levels. The image was taken from a position further away from the audience than the participants in the experiment, in order to get a wider view. The stage walls, which are not visible in the image, were modelled with a dark matte material to reduce the light reflected towards the audience. We chose to define the levels qualitatively, rather than to make a precise calculation of the lighting, because the main aim of the study is to determine whether audience-induced anxiety can be reduced by modifying the environment.

Fig. 3
figure 3

Ambient light intensity levels used in the experiment. a High, b Medium, c Low

On the other hand, in order to study the effect of the distance to the audience on the musicians, three possible positions were defined in which the participants of the experiment could be placed (see Fig. 1). One of the positions is in the centre of the stage, referred to as "Middle", 2.5 ms in the opposite direction to the audience is another position, referred to as "Far", and finally 2.5 ms from "Middle" in the direction of the audience is the third position, which we will refer to as "Near".

2.3 Subjects

The subjects were 61 musicians (34 male and 27 female) from the Music Conservatory of Murcia (Spain), of whom 7 were professionals and 54 were students of professional music studies between the second and sixth year. Participants ranged in age from 11 to 54, with a median age of 17. The most frequently used instruments were the clarinet (\(\hbox {n}=20\)) and the violin (\(\hbox {n}=16\)). To a lesser extent, the viola (\(\hbox {n}=8\)), saxophone (\(\hbox {n}=7\)), oboe (\(\hbox {n}=4\)) and trumpet (\(\hbox {n}=2\)) were used, and finally, the cello, bassoon, piano and trombone were used by only one musician each.

All subjects gave their consent to take part in the study. In the case of those under the age of 18, consent was also obtained from their legal guardians. All participants were asked to report their experience of using immersive VR devices; 41 reported no experience of using this type of device, 15 reported limited experience and 5 reported moderate experience.

2.4 Subjective assessment

A Semantic Differential Scaling (SDS) questionnaire was used, in which participants could rate different characteristics of the environment on a 7-point scale between two pairs of bipolar adjectives.

A total of 8 word pairs were used to measure two key aspects:

  • Overall assessment of architectural design: inviting/boring (I/B), spacious/tight (S/T), light/dark (L/D), symmetrical/asymmetrical (S/A) and balanced/unbalanced (B/U).

  • The effect of audience presence on the perception of the environment at different distances and lighting conditions: calming/disturbing (C/D), empty/crowded (E/C) and exposed/isolated (E/I).

The words in the first group were selected from some of the most commonly used words to describe changes in architectural features, based on a previous study (Ergan et al. 2018).

On the other hand, the Slater Usoh Steed (SUS) (Slater et al. 1994) questionnaire was used to measure the sense of presence in the virtual environment. This questionnaire consists of 6 items that can be scored from 1 to 7, with a maximum score of 42 (6 items x 7 points). Low scores indicate that the environment is perceived as artificial, while high scores are associated with a strong sense of being in a real environment. This questionnaire does not define thresholds above which the experience can be considered comparable to that of a real environment; however, studies in which simulations with high levels of realism achieved scores around 30 argue that this value reflects a high degree of immersion in the virtual environment (Castilla et al. 2023).

The items of the questionnaire are not standardised and are usually adapted to the characteristics of the environment; the following items were used in the present study:

  • K1: I had a sense of "being there" in the auditorium space: Not at all/Totally.

  • K2: There were times during the experience when the auditorium space was the reality for me: At any time/Almost all the time.

  • K3: The virtual environment was more like...pictures I saw/a place I visited.

  • K4: For the duration of the experience, I had a stronger sense of being...in the auditorium/anywhere else.

  • K5: I think of the virtual environment as a place similar to other places I have been today: Not at all/Totally.

  • K6: During the experience, I often thought I was actually in a real auditorium: Not at all/Totally.

2.5 Eye tracking

The previous questionnaires can be very useful to know the subjective experience of the participants in the built environment. However, they do not allow us to know which elements of the built environment contributed to this experience. By using eye tracking, it is possible to determine where visual attention is focused (Adhanom et al. 2023), which can help to understand the relationship between the features of the environment and the human experience in that environment.

All the objects in the auditorium were grouped into the following four categories in order to simplify the subsequent analysis:

  • Audience: contained virtual people and seats

  • Walls: comprised all walls, pillars and wall lights

  • Ceiling: included ceilings and ceiling lights

  • Floor

The lights were objects that stood out from the rest, so it might have been interesting to include them in a separate category. However, given that the eye tracking system used had an accuracy of between \(0.5^{\circ }\) and \(1.1^{\circ }\) and that the lights were located at a large distance from the subject, it was decided to include them in the wall and ceiling categories, as it was not possible to reliably know how long the lights were being looked at.

In addition to analysing visual attention, eye-tracking data can be used to detect changes in emotional arousal associated with changes in anxiety levels. The review by Skaramagkas et al. (2023) identified several studies that found a direct proportional relationship between emotional arousal and pupil diameter, blink frequency and saccadic eye movements.

The SRanipal eye tracking software used directly provides pupil diameter and detects blinks. However, it does not classify eye movements. To detect fast eye movements, called saccades, we calculated the angular velocity of gaze from the vector indicating the direction of gaze. And we detected saccades by applying a median filter to eliminate possible extreme values caused by measurement errors, and setting a threshold of 100 degrees/second. Table 1 describes the eye tracking features used in this study.

Table 1 Basic significance of eye-tracking features

All of these variables were indicators of anxiety. In our study, however, pupil diameter could not be used for this purpose because it changed with the different lighting conditions. Furthermore, in our case, it also changed when the distance to the audience changed, because the stage was better lit than the rest of the auditorium. When the musician was moved away from the audience, the part of the stage that entered his/her visual field increased. Nevertheless, this specific variable can provide relevant information in the analysis of perceived light intensity and was therefore included in the analysis.

The virtual environment was presented to the participants using the HTC Vive Pro Eye HMD. This device has a resolution of \(1440 \times 1600\) per eye with a refresh rate of 90 Hz and a viewing angle of \(110^{\circ }\). Rendering was performed on a PC with 32 GB of RAM and an Nvidia RTX 2070 graphics card.

2.6 Procedure

MPA is an anticipatory response that generally peaks just before the start of the performance and declines throughout the performance (Papageorgi et al. 2011). For this reason, a short experiment was designed in which participants had no time to settle in and could assess the environment at a time close to the theoretical peak of anxiety. A few days before the experiment, participants were given an informed consent form to sign. The experimental procedure is shown in Fig. 4.

Fig. 4
figure 4

Experimental procedure

After submitting the signed informed consent form, participants completed a demographic information questionnaire. The experimental procedure was then explained to them. They were given the VR controller and told how to use it to interact with the virtual panels. They were then given the HMD and helped to fit it correctly on their head. Once they indicated that they were comfortable with the helmet, the eye tracking calibration began. During this process, the HMD manufacturer’s software checked that the position and interpupillary distance of the device had been set correctly and, if not, instructed the participant on how to adjust it. Once calibration was complete, participants were taken to a virtual waiting room where they were shown a panel similar to the one they would later use to complete the questionnaires. The waiting room was a rectangular room with white walls and no windows or other stimuli. The purpose of this room was to facilitate adaptation to the virtual reality environment.

After 30 s in the room, the repeated measures procedure was initiated. This process started with another 30 s in the waiting room, the last 10 s of which were marked by a 15-second countdown visible to the participants, indicating the moment they had to start playing. For one minute, participants had to play a piece of music of their choice in the virtual auditorium. During this time, they could not see the instrument or the score, which added to the difficulty, so they were given the option to improvise anything if they could not remember a piece of music or even play scales. Inside the auditorium, participants were placed in one of three possible locations and shown one of three available lighting levels, both variables chosen randomly to eliminate the effect of order. After one minute, an acoustic signal was activated to indicate that they should stop playing, and they had to complete an SDS questionnaire to rate the environment using the VR controller.

A total of three measurements were taken for each participant, each with lighting and a distance to the audience that the participant had not seen before, so that by the end of the experiment each participant had visualised all three levels of both variables. Once the repeated measures were completed, the HMD was removed and participants were asked to complete the SUS questionnaire to rate their sense of presence in the virtual environment. It was decided that this questionnaire should be completed outside of the VR environment, as the questionnaire items relate to recollection of the environment.

2.7 Data analysis

The data collected included different types of information and scales. The results of the questionnaires were ordinal discrete values on a scale from 1 to 7. The time spent looking at each category of objects and the ratio of saccades were continuous values on a scale from 0 to 1. After examining the histograms of the different variables and performing the Shapiro–Wilk test, we found that most of the variables did not follow a normal distribution and therefore we decided to use nonparametric models for data analysis.

In some of the word pairs used in the SDS questionnaire, such as "soothing/disturbing", one of the words had a clear positive connotation and the other a clear negative connotation. However, some of the scales may be more difficult to interpret. For this reason, Spearman’s correlation coefficient was calculated to examine the relationship between the different word pairs, and p values were corrected using the Bonferroni correction.

The Friedman test was used to determine whether the differences between the repeated measures of the variables used were significant. For those variables where statistically significant differences were found (p<0.05), a post-hoc analysis was performed to determine between which groups these differences occurred. This was done using the two-tailed Wilcoxon signed rank test with Bonferroni correction, followed by a one-tailed analysis to determine the direction of the effect in those pairs where differences were found.

3 Results

3.1 Analysis of the sense of presence

The subjects’ sense of presence in the virtual environment was measured using the SUS questionnaire, the results of which for each item are shown in Fig. 5. In general, high scores were obtained with a mean and median above 5 in all cases; however, in the fifth item, there is a large spread in the ratings, which may be due to the type of building evaluated. As it is an auditorium, it is unlikely that the subjects were in a similar place on the same day.

Fig. 5
figure 5

Box-and-whisker plot of SUS questionnaire score. The triangle indicates the mean value

The mean value of the sum of the questionnaire items was 31.52, with a standard deviation of 6.41, and the quartiles were 27, 36 and 42. Based on these results, we can conclude that the participants had a high degree of immersion and therefore the experience in this environment was similar to what they could have had in the real world.

3.1.1 SDS questionnaire

According to the criterion of Dancey and Reidy (2004), the degree of correlation between the different word pairs was mostly weak (less than 0.4 in absolute value), as shown in Fig. 6.

Fig. 6
figure 6

Spearman correlation matrix of SDS with correlation coefficients. Significance levels are indicated by asterisks: \(*<0.05\), \(**<0.01\), \(***<0.001\)

Environments rated as more symmetrical and balanced were perceived as more spacious, and the balance of the environment was also related to its symmetry. As expected, environments perceived as more empty were also perceived as more calming. In addition, a negative correlation was observed between the reported degree of exposure (E/I) and the C/D and E/C scales.

Table 2 shows the median and interquartile range (IQR) of the different semantic scales for each lighting condition and distance to the audience. Considering that the median of the scale is 4, the results suggest that the environment was perceived as balanced, exposed, inviting, spacious, symmetrical and slightly crowded. In terms of perceived brightness, the median of the L/D scale was closer to ’light’, except in scenes with low ambient lighting.

Table 2 Median and IQR of each SDS for each lighting and distance to audience, along with p value of Friedman’s test

No statistically significant differences were found in the effect of lighting or distance on the rating of symmetry of the environment or on the I/B scale. On the remaining scales, differences were found in at least one of the two variables. On the C/D scale, the medians seem to indicate that the environment is perceived as more disturbing as the distance to the audience decreases. On other scales, however, it is difficult to deduce how the variables studied influenced the perception of the environment simply by looking at the medians. For this reason, a post-hoc analysis was carried out on the scales where significant differences were found.

Table 3 shows the p value obtained in the post-hoc analysis. As expected, the largest differences were found between the most extreme levels, with the exception of the B/U scale, where significant differences were found only between the position closest to the public and the one in the middle of the stage.

Table 3 Post-hoc analysis of the SDS results: corrected p value of the two-tailed Wilcoxon signed rank test

A one-tailed analysis was performed to determine the direction of the effect, see Table 4. Scenes with high ambient lighting were perceived as more intrusive and exposed than those with low lighting, and also more exposed than those with medium lighting. The table also shows that in the closest position to the audience, the environment was perceived as more disturbing, crowded, exposed and confined than in the farthest position. In addition, the auditorium was also perceived as more unbalanced, crowded and confined in the closest position than in the middle position. These results support the hypothesis that the presence of the audience changes the perception of the environment and that it is possible to create less intimidating environments by reducing the visibility of the musician to the audience.

Table 4 Relationships between the different scenes observed in the SDS after performing the one-tailed Wilcoxon signed rank test

On the other hand, the L/D scale reflected the fact that the subjects perceived changes in the intensity of the ambient light; the darker the environment, the lower the ambient illumination.

3.1.2 Eye tracking

In order to perform the within-subject analysis of the eye tracking data, the data sets of 3 subjects had to be discarded, because in at least one of the scenes the data were considered invalid by the HMD manufacturer’s software.

Changes in lighting and distance to the audience significantly affected gaze behaviour (see Table 5). In terms of visual attention, the audience was the category that participants looked at the longest. However, the time spent looking at this category was varied by changing the audience distance and lighting. The results were expected, since increasing the distance to the audience increases the surface area of walls, floors and ceilings that enter the visual field. And reducing the lighting makes the lights on the walls and ceilings stand out more. Nevertheless, these results are interesting because they indicate that it is possible to reduce attention to the audience, an anxiety-provoking stimulus, by modifying the environment.

Table 5 Median and IQR of each eye tracking variable for each lighting and distance to audience, along with p value of Friedman’s test

No statistically significant differences in blink frequency were found in any of the variables, but significant differences were found in pupil diameter and in variables related to eye movement. The simplest explanation for changes in pupil diameter is variations in the intensity of light received by the eye. However, it is important to note that while no significant differences were found in the perception of luminance between the different distances, there were significant changes in pupil diameter, suggesting two possible explanations: (1) the perception of luminance was subjective and did not depend on the intensity of the light received by the eye, and (2) the changes in pupil diameter were caused by variations in the level of anxiety.

Table 6 shows the results of the two-tailed post-hoc analysis of those variables for which the Friedman test revealed statistically significant differences. The analysis of time spent looking at the ceiling was not performed because all the medians were zero, indicating that many of the participants did not look at the ceiling.

Table 6 Post-hoc analysis of the eye tracking results: corrected p value of the two-tailed Wilcoxon signed rank test

For visual attention, significant differences were found between the three lighting levels, except for the time spent looking at the floor between "High" and "Medium". For distance to the audience, differences were found only for the times between the two most extreme levels. For the remaining eye tracking variables, the Wilcoxon test showed significant differences between all distances, except for the number of saccades per second, where the p value was greater than 0.05 between "Far" and "Medium". With respect to illumination, differences were found in pupil diameter and saccade ratio, the latter only between the two most extreme levels.

A one-tailed analysis showed that the musicians looked longer at the audience as the illumination increased (see Table 7). Similarly, it was observed that the time spent looking at the audience was longer in the position closest to the audience than in the position furthest away.

Table 7 Relationships between the different scenes observed in the eye tracking variables after performing the one-tailed Wilcoxon signed rank test

Pupil diameter behaved as expected, decreasing in scenes with higher ambient lighting compared to the other two lighting levels, and also decreasing with increasing distance from the audience, probably due to the fact that the stage lighting was higher than the ambient lighting.

Regarding the eye movement variables, the saccade ratio was higher in the low ambient light than in the high ambient light. However, as mentioned above, this variable alone is not a sufficient indicator of anxiety, especially considering that the low lighting scenes were reported to be more calming than the high lighting scenes. On the other hand, there was a clear trend in eye movements between the different distances to the audience. As distance decreased, average eye velocity and saccade ratio increased. In addition, an increase in the number of saccades per second was observed when comparing the position closest to the audience with the other two positions. These results, together with those from the SDS questionnaire, suggest a relationship between the distance to the audience and the musicians’ anxiety level.

In conclusion, both the lighting and the distance to the audience altered the perception of the environment and the gaze behaviour of the musicians participating in our study.

4 Discussion

The aim of the present study was to investigate whether it is possible to reduce the anxiety-inducing effect of the presence of the audience on musicians during solo performances by manipulating the built environment. From a methodological point of view, two important aspects should be highlighted: (1) the use of immersive VR and (2) eye-tracking measures.

Virtual reality allows great control over the variables of the experiment and greatly simplifies the logistics by not requiring a real audience. However, the use of an HMD is challenging for several reasons. First, it requires real-time rendering for each eye, which is computationally expensive, especially when a large number of animated objects need to be rendered simultaneously. In our study, we were able to successfully simulate more than 800 virtual humans by creating different LODs. However, there are more efficient techniques, such as the use of impostors (Beacco et al. 2015), that would allow tens of thousands of virtual humans to be rendered if necessary. Secondly, the participants had to play without being able to see the instrument. While there are tracking techniques that would have allowed them to see the instrument in the virtual environment (Serafin et al. 2016), their implementation is instrument-specific and would have been technically very costly to implement. Depriving the musicians of the ability to see the instrument increased the difficulty of the task and, consequently, could have increased their anxiety levels. However, they were given the freedom to play whatever they wanted to facilitate the task and to some extent compensate for the effect of not being able to see the instrument. Despite this limitation, participants showed no difficulty in manipulating the instrument without seeing it. Finally, the use of the HMD can be jarring when first used, so a virtual waiting room with minimal visual stimuli was used to allow subjects to become accustomed to the VR environment.

On the other hand, the use of eye tracking in the field of architecture is not new, but this technique applied to architectural research is usually only used to study visual attention (Li et al. 2021). Studies in other disciplines suggest that some features of eye tracking can be used to detect changes in anxiety levels (Skaramagkas et al. 2023). In our study, we successfully used both measures, visual attention and anxiety-related parameters, which contributed to a better understanding of the musicians’ experience.

In terms of experimental design, time was a key factor. Considering that previous research suggests that the MPA peak occurs just before the start of the performance (Papageorgi et al. 2011), we designed an experiment that was as short as possible. This meant that the subjects had no time to acclimatise to the environment, and that the audience ratings would take place as close to the anxiety peak as possible. For this reason, we decided that each participant should visualise only three scenes, and we used a questionnaire with a reduced number of items.

Study participants reported a high sense of presence in the virtual environment. However, in the fifth item of the SUS questionnaire, the score was significantly lower than in the other items and the IQR was significantly higher. This suggests that this item may not have been worded correctly, as an auditorium is not a place that is normally visited on a daily basis, and it does not seem appropriate to ask participants whether the place was similar to others they had been in that day.

The results of the experiment showed that changes in ambient light intensity and distance to the audience caused significant differences in both subjective perceptions of the environment and eye behaviour. According to the results of the post-hoc analysis, most of the differences occurred between the most extreme levels, but it can be observed that for most of the SDS and eye tracking variables, the median value of the intermediate level of light intensity or distance to the audience was between the value of the most extreme levels. It should also be noted that the Bonferroni correction was used in the post-hoc analysis, a criterion often considered to be overly conservative (Narum 2006).

Several studies discussed in the introduction relate MPA to the presence of an audience, with a focus on aspects such as the size (Bissonnette et al. 2016; Mostajeran et al. 2020; Orman 2004) and attitude (Pertaub et al. 2002) of the audience. In reality, however, it is not always possible to control the attitude of the audience, and reducing its size may not be feasible due to organisational or economic constraints. To our knowledge, there are no previous studies analysing MPA in relation to neuroarchitecture. The most similar to ours is that conducted by Fich et al. (2014), who investigated the effect of characteristics of the built environment on psychological stress. Our results are in line with these previous investigations, confirming that looking at the audience makes the environment perceived as more disturbing and that the gaze reflects a more anxious behaviour. Furthermore, our results suggest that it is possible to alter the effect of the audience on the musicians by manipulating the characteristics of the environment, without having to reduce the number of spectators.

The results of the present work also show that the scenes in which less time was spent looking at the audience were rated as more calming and isolated. They could be attributed to avoidance behaviour, but if this were the case, one would expect to see avoidance behaviour in those with high lighting and reduced distance to the audience, where musicians are more aware of the presence of the audience. In our experiment, the calming effect was achieved by reducing the ambient lighting and also by increasing the distance to the audience. However, in real auditoriums, increasing the distance between the musician and the audience could have a negative effect on the audience’s experience. It is therefore necessary for future research to explore other ways of reducing the impact of the audience on the musician that do not degrade the audience experience. Features of a building, such as curved shapes or the presence of natural elements, have been shown in previous studies to have positive effects on people’s psychological well-being (Assem et al. 2023), and future research could explore their effects on MPA.

5 Limitations

This paper demonstrates the potential of the built environment to enhance the experience of musicians during performance. However, it is important to note the limitations of the experiment conducted.

Performance anxiety questionnaires were not used to assess participants’ anxiety levels in order to reduce the number of test items, which is a limitation and makes it difficult to compare results with other studies. Similarly, we chose to ask participants to rate the level of calm or disturbance they experienced in the environment, rather than asking them to report their anxiety level directly, thus avoiding asking them to report changes in anxiety level that they perceived as unrelated to the environment.

There are several limitations to using an HMD. The brightness, resolution and viewing angle of the device are lower than what the human eye can perceive. It also obstructs the musician’s view of the instrument, which can affect performance. In addition, the weight of the headset and the volume it takes up can cause minor discomfort when playing musical instruments, especially those that are held very close to the head, such as the violin. However, it is important to note that no situations were observed during the experiment in which the HMD came into contact with the hands or the musical instrument.

With regard to the difficulty of the task, since each musician’s ability to play "blind" is different, it was not possible to control the level of difficulty imposed on each subject. The possibility to improvise or even to play scales could influence the level of challenge and consequently the anxiety caused by the fear of making a mistake. In this context, it is crucial to emphasise that the possibility of improvising or playing scales was offered to those subjects with a lower level of mastery of the instrument, who expressed their inability to play a song without visualising the instrument or the notes, in an attempt to standardise as much as possible the level of difficulty of the task for all musicians.

Another limitation of the study is that the size of the audience was not addressed, which limits the generalizability of the results. With regard to the assessment of MPA, it is important to note that this condition can manifest itself in different ways. In our study, we focused on psychological arousal as an indicator of anxiety, which is a limitation.

6 Conclusions

This study investigated the effect of ambient lighting and distance from the audience on musicians during solo performances in relation to anxiety caused by the presence of the audience. The aim was to determine whether it is possible to create performance spaces that are more reassuring to musicians without reducing seating capacity. To this end, an experiment was conducted in which conservatory musicians were presented with different variants of an auditorium while playing their instrument using immersive virtual reality.

This virtual environment created a high sense of presence in the subjects, even though the exposure time to this environment was short. Both the subjective and eye-tracking results point in the same direction: introducing changes in the environment that reduce the visibility of the musician to the audience leads to a more positive perception of the environment.

Although this study has some limitations that invite further research, the methodology used may be of interest for future research in the field of neuroarchitecture. In addition, the results of the study can be used by architects and engineers to establish design guidelines for auditoriums that can enhance the performance experience of musicians.