1 Introduction

Virtual reality environments (VR) have become widespread among younger users and are mostly used for gaming and recreational activities (Jarvinen 2017; VR 2023). In the last decade, a growing number of fully-VR or VR-compatible games and tools have emerged on various platforms, including Meta, Steam, and PlayStation, and the number is expected to grow further (Ma et al. 2014; Epp et al. 2021). A key factor in the success of VR is the increasing affordability of head mounted devices (HMDs), with Meta Quest leading the way with over 20 million units sold in the last three years (Heath 2023; Loveridge 2023). While its popularity (Steam 2023) is influenced by the considerably lower price compared to other HMDs, the Quest’s ability to perform as a standalone device differentiates it from the company’s previous models and other manufacturers’ devices, such as HTC Vive and PlayStation VR (Nagta et al. 2022). The increased accessibility of the device opened new opportunities for its use beyond gaming. For example, several most-popular games on the Meta Quest store are used for exercise, or exergaming (Szpak et al. 2020; Faric et al. 2019). Exergaming has previously been shown to increase the duration of sessions results in increased popularity (Moholdt et al. 2017; Sween et al. 2014). On the Quest platform, games such as Beat Saber and Pistol Whip that base the user action on physical movements synchronised with the musical rhythm, have among the largest player bases (Nair et al. 2023).

Along with the growing popularity of the VR devices, a variety of educational tools for VR has also emerged (Oyelere et al. 2020), moving from traditional web and mobile applications to VR environments. HMDs are used as serious immersive learning tools, such as productivity-oriented (Zeller and Barfuss 2022; Pérez et al. 2019), educational (Checa and Bustillo 2020; Rojas-Sánchez et al. 2022) and health applications (Tao et al. 2021; Wang et al. 2022; Queirós et al. 2023). Examining the effects of VR environments revealed several advantages, notably a substantial enhancement in learner engagement, particularly when preceded by orientation in virtual reality interaction (Christopoulos et al. 2018). Clear potential of VR environments was shown while exploring the learning outcomes and disadvantages of use for specific topics, such as music training scenarios (Innocenti et al. 2019).

1.1 VR in music training

To explore the potential of VR environments for rhythmic skill training, we developed a rhythmic game named Steady the drums!, which is based on a well-known tower-defense game scenario. The main user actions consist of defending a tower from the oncoming enemies. The user guides their troops by playing different rhythmic patterns on two drums. The game builds on speed-correlated hit rate motivation of rhythm-based games, focusing on accuracy, short-term memory and ability to adapt to different tempos. We observed the participants’ performance while playing four game scenarios: following a fixed tempo, following time-varying tempo, and two variants of pattern repetition. We analyzed how playing the four scenarios of the game daily for 14 days affected children’s rhythmic perception and performance. We evaluated the effects using a pre-test and a post-test with the Tapping- PROMS test battery (Georgi et al. 2023), while the user experience was analyzed with a User Experience Questionnaire (UEQ), along with additional questions about the participants’ experience with the headset. The presented experiment therefore evaluates the effectiveness of a VR game for rhythmic training as a novel approach compared to more conventional e-learning tools, such as mobile and web applications, using the motor and sensory modalities of VR for a better user experience and increased motivation.

The remainder of the paper consists of the following sections: we first provide an overview of related research, followed by a description of the developed game. We continue with a description of the experimental design, consisting of pre- and post-tests, a user experience questionnaire, and user feedback. We continue with the analysis of the collected data, followed by a discussion of the collected results. We conclude this paper with the results of this study and future plans for game development, along with its use in e-learning and remote music education environments. As a residual result of this research, the VR game, along with its source code, is made available to enable further development the game by the community.Footnote 1

2 Related work

The acquisition of musical skills encompasses a diverse set of challenges, including instrumental proficiency, music theory comprehension, and ear training (Wallentin et al. 2010; Leite et al. 2016; Correia et al. 2022). Among these skills, rhythm plays a crucial role and requires substantial practice and theoretical knowledge, regardless of the instrument being learned. However, practicing rhythmic exercises outside the classroom can be monotonous, especially for school-age children. This challenge has been previously tackled with the use of digital tools, such as mobile applications and web platforms (Waddell and Williamon 2019; Pesek et al. 2020a).

2.1 Rhythmic skills

A variety of music-related games and applications are available on the web and mobile platforms (Bégel et al. 2017), ranging from music training, to gamified exercises, rehabilitation and well-being (Vargas et al. 2020; Miskinis et al. 2021; Bella 2022). In their study, Bonacina et al. (2019) explored rhythmic skills in school-age children (ages 5–8 years), and investigated the relationships among four rhythmic tasks hypothesized to reflect different clusters of skills. These tasks were drumming to an isochronous beat, rhythmic pattern memorization, drumming to the beat in music, and clapping in time with visual feedback. The study found no significant relationship between drumming to a beat and rhythmic patterns memorization. However, clapping in time with visual feedback was found to correlate with performance on the other three rhythm tasks. Moreover, while drumming to a beat of music performance did not change, other rhythm skills improved over time.

Serafin et al. (2017) explored the considerations on the use of VR and AR in music education, highlighting that Virtual Reality Musical Instruments (VRMIs) have not gained significant attention in the music sphere. The study identified the value of VR in training rhythmical skills, playing together while being apart, addressing stage fear, teaching composition and music production, and developing STEAMFootnote 2 skills through programming.

A study by Tierney et al. (2017) delved into the examination of individual differences in rhythmic skills and their associations with neural consistency and linguistic proficiency. In this study, 64 participants were engaged in a series of tests related to rhythm, language, and perception. Of particular interest to our research, the researchers identified two distinct clusters of rhythmic skills when analyzing tasks such as synchronization, tempo adaptation, timing adaptation, beat synchronization, sequence memory, and drumming to sequences. Notably, the first three tasks exhibited strong correlations with each other but not with sequence memory or drumming to sequence tests. Furthermore, the beat synchronization test displayed significant correlations with all rhythm tests except sequence memory. Utilizing factor analysis, the researchers derived two distinct clusters of rhythmic skills categorized as sequencing and synchronization. Drawing upon these findings, we incorporated four primary rhythmic tasks, namely beat tapping, tempo adaptation, drumming to sequences, and repeating a sequence, within our VR application, with two tasks assigned to each cluster of rhythmic skills.

2.2 AR/VR music approaches

A study by Shahab et al. (2022) focused on the use of VR for children with autism, specifically to improve their social skills by simulating the real world. They designed a virtual music classroom for musical rehabilitation, consisting of two virtual humanoid robots and virtual musical instruments for each of the robots (a xylophone and a drum). The study was conducted over a 20-week period with 5 children aged 6 to 8 years. They used various assessment tools to examine the effectiveness of the education for children with ASD. Human assessments included a psychologist examining the children’s ability to recognize/express colors, performance on the Stambak Rhythmic Structures Reproduction Test to assess the children’s musical skills, and a quantitative/qualitative assessment of the children’s social and cognitive skills that focused on imitation and joint attention. The results showed an overall upward trend in the participants’ musical abilities, but no significant improvement in the children’s cognitive abilities.

A VR application called Teach me drums was developed by Moth-Poulsen et al. (2019). The application was specifically tailored for learning hand drums and focuses on the process of learning to play drums. The application is based on pre-recorded 360-degree videos and does not utilize other VR immersive components. They compared a control group using a simple 2D application with visual feedback to the experimental group using a VR headset. Since no additional feedback was provided, the two experimental setups were primarily designed to investigate the effects of a drum teacher’s first-person perspective on beginner rhythm learning. Results from 35 participants showed no significant differences in both the subjective self-assessment questionnaire and rhythmic accuracy. It is therefore important to point out that they focused on analyzing the first-person perspective of a VR headset and neglected to fully exploit its immersive capabilities. The applicability of VR to music education remains to be explored.

Feedback in a virtual environment is as important as the virtual scene. Moon et al. (2022) analyzed the usability and performance of hand tracking, controllers, and a developed feedback device to investigate enhanced presence and engagement in a rhythmic game. Their results reveal the importance of haptic feedback to the type of rhythmic play. Keeler (2020) investigates the effects of video games on rhythmic performance in music education. The study compares non-VR and VR games with pre- and post-tests to measure rhythmic performance and beat proficiency using Flohr’s Rhythm Performance Test Revised (2004) and the Short Flow State Scale-2 to measure flow, with a relatively small user study of 8 participants showing little difference in rhythmic performance between non-VR and VR games.

Pinkl and Cohen (2022) proposed a VR action observation tool for rhythmic coordination training. The study explores three perspective options, including a prerecorded monoscopic spherical video scene, a first-person perspective scene, and a third option showing prerecorded arm and hand movements. The goal is to learn by experiencing the executed gestures of the rhythms in the first person and mimicking the play of the sticks or hands in the virtual scene or video. In their second work, Pinkl and Cohen (2023) provide an extension of their setup by introducing simultaneous feedback and multimodal cues to enhance the user’s immersion. This significantly extends their previously developed drumming practice tools, but without a user study it is difficult to assess the effectiveness of such a system in a practical teaching context.

One of the most difficult requirements of practicing musical skills is the student’s ability to self-assess their performance without feedback from a professional. Johnson et al. (2020) studied computer-assisted musical instrument tutoring (CAMIT) for MR, by developing MR:emin, an immersive MR music learning environment for the theremin, an electronic musical instrument controlled without physical contact. They conducted a user study with 30 participants using a NoVis environment with auditory feedback only, a NoImm environment, and an Imm environment with both visual and auditory feedback, the first using only a high-resolution display and the second using an HMD. While both NoImm and Imm environments with visual feedback showed more accurate performance during training, they actually resulted in a slightly lower percentage improvement in learning transfer. The researchers suggested that this was due to the addition of strong visual elements, which caused participants to rely more heavily on visual than auditory feedback. Post-test interviews confirmed this conjecture. Participants indicated that they relied more heavily on tuning the virtual hand position to the visual feedback than to the auditory feedback. Although this was not as clear in the UEQ, participants preferred the Imm environment and found it much more engaging. Even though the results of the study are more closely related to teaching instruments than rhythmic skills, we followed some of the important guidelines they mentioned that were applicable to our research objectives. We designed the tasks so that participants had to focus on auditory feedback rather than visual feedback. The visual cues served only as additional guidance during training and did not unnecessarily distract users with a stronger musical background.

2.3 Assessing rhythmic skills and user experience

For any user study focusing on music learning and education it is important to choose an appropriate method to asses musical skill in participants. Although an individual’s formal or informal musical training provides some insight, it does not adequately cover differences in innate abilities among musically untrained participants. In 2012, Law and Zentner (2012) proposed a new test to better measure musical perceptual skills, called the Profile of Music Perception Skills (PROMS), which covers several domains: tonal, temporal, qualitative, and dynamic. They validated their test battery against the established tests of Gordon’s Advanced Measures of Music and Musical Aptitude Profile with a user study involving 56 listeners, and evaluated a short version of the test in an independent study by Kunert et al. (2016). In subsequent years, the same group of researchers also developed shorter versions of the test battery in the form of the Short-PROMS and the Mini-PROMS (Zentner and Strauss 2017). For the purpose of this study, it was important to focus only on rhythmic skills, so we decided to use a recently introduced version of the PROMS test, called the Tapping-PROMS (Georgi et al. 2023). In it, the researchers presented a standardized rhythm tapping test that includes test items of varying complexity by using the rhythm and tempo subtests of the PROMS test battery. They validated their approach by developing a publicly available Python application and administering the tapping test to 40 participants. We used their application in our conducted user study as part of the pre-test and post-test.

While significant progress has been made in recent years in the interdisciplinary fields of e-learning, music education, and VR, a review of related work has revealed several avenues for further exploration of VR environments for individuals’ musical abilities. Evaluating the impact of VR on participants’ performance could greatly benefit from analyzing their performance using a validated instrument such as the PROMS test. Moreover, the use of VR in a self-learning scenario also opens a new opportunity to assess user motivation and weigh it against the remaining challenges of this new learning environment, as mentioned earlier in the review (e.g., Christopoulos et al. 2018). In addition, measuring participants’ core performance rather than academic performance provides a different perspective to evaluate how such VR games can be used for well-being and rehabilitation scenarios (e.g., Vargas et al. 2020). Finally, we expect that the evaluation of user experience in this research will highlight remaining challenges of VR use for learning that should be addressed to achieve greater adoption in existing learning processes.

3 Steady the drums! game

Steady the drums! is a rhythm-based VR game based on a tower defense scenario. The game is a mixture of real-time strategy and rhythm game, where the player uses VR controllers to play rhythmic patterns on virtual drums with haptic and visual feedback in order to perform various actions, such as attacking, defending and summoning different types of soldiers. Enemies appear at regular intervals in successive waves, with each wave increasing in number and/or strength. To counteract the increasing strength of the attack, the player can spawn more units using new patterns that are unlocked after correctly playing five consecutive patterns. The user’s primary view of the game is shown in Fig. 1.

Fig. 1
figure 1

The primary user’s view. The goal of the game is to defend the castle using soldiers that are controlled by different rhythmic patterns played by the user. The available patterns are displayed in the center of the view. The grayed out patterns become available when the Combo of value five or more is reached. The Damage multiplier changes with each consecutive pattern, depending on the accuracy of the hit pattern

Fig. 2
figure 2

Screenshots of the Steady the drums! game. Subfigure a shows the player’s position behind the drums, the soldiers (bottom center) and the upcoming enemies (top center). Subfigure b shows interactions between the soldiers and the enemies from a different camera position

We developed our VR environment in a free version of the Unity 2021.3.16f game engine using the Oculus Unity Integration Toolkit, which was later ported to the OpenXR Toolkit. It supports all currently available Oculus/Meta Quest devices (Quest 1, Quest 2 and Quest Pro). The assets used in the game were acquired through the Unity Assets Store. The framework contains all the necessary building blocks developed specifically for the Meta Rift, Meta Quest 1 and 2 platforms. Currently, version 0.5 of the game is available through the Oculus App Lab, released on November 2, 2022, and can be installed on all Quest devices without restrictions. The final released version is shown in Fig. 2. The game does not require an active Internet connection, so it can be used in situations without Internet access, such as schools where network access is intentionally disabled.

3.1 Interaction design

The interaction design of our game is based on general gamification principles that have been adapted for virtual reality. We provide players with both short-term feedback and long-term rewards to maintain their motivation and performance. Visual and haptic feedback responds to the accuracy of their play, while a combat counter rewards accurate performance by increasing the player’s army damage and offering special command patterns to spawn new units. In addition, the game introduces new instruments within the orchestral background music, serving as a rhythmic guide for players to execute different patterns to command their army.

On the other hand, our educational version deviates from these elements. The strategic element of the game was removed as the player’s army performed the most valued action each time the player successfully completed a rhythmic task (described in detail in Sect. 3.2). Consequently, while retaining the visual cues and haptic feedback, we omitted the combat counter. In its place, players now receive a battle cry sound effect when their army performs an action following the successful completion of a task.

The inclusion of the user’s motor skills, which are implemented in our game, can also be further supported by neuroscientific research. The literature review by Vuust et al. (2022) shows that there is growing evidence that Active Inference can orchestrate our rhythmic experiences, harmonizing perception, prediction and motor engagement. Additionally, the current AI (Artificial Intelligence) engineering and operation hints for operationalizing the generation and deployment of rhythm action games (Takada et al. 2023). Similarly, we employed the Troubadour platform in this experiment to utilize machine-learning approaches for pseudo-random sequence generation.

3.2 Educational scenarios

In addition to the published game, we began developing several guided learning scenarios embedded in an enhanced version of the game (to be released as an update in the Oculus App Lab). Four different scenarios were developed. The first scenario involves basic rhythmic repetitions of drumming with both hands while following the given tempo. The second scenario extends repetitive drumming to an alternating tempo, alternating between the basic (slower) tempo and a (30 %) faster tempo. The third scenario asks the user to repeat a two-bar rhythmic pattern, supporting the user with a basic tempo and a visual cue of the played position within the pattern. Finally, the fourth scenario extends the repetition of the pattern by not playing the underlying audio track when the pattern is repeated by the user. The selected scenarios relate to the four aspects of rhythm assessed in the study by Bonacina et al. (2019).

Fig. 3
figure 3

Screenshot of the educational version of the Steady the drums! game. The instructions and visual cues are written in Slovenian, as this was the mother tongue of the users. Figure presents the user interface as seen with the pattern repetition scenario. Blacked-out rhythm symbols represent rests, and the red/green marked symbols represent the current position within the pattern, synchronised with music

The gameplay of the educational version (shown in Fig. 3) consists of the tutorial and five levels of game. All levels include the four scenarios, varying in the difficulty of patterns: easy (tutorial, level 1), moderate (levels 2–4), and difficult (level 5). To enhance player engagement, we ensure that the types and damage of enemies undergo changes as the levels progress, increasing the difficulty. Additionally, the tasks with which the players must engage are randomly distributed within each difficulty level, while still maintaining an equal distribution of the four described task scenarios throughout each play session.

3.3 Generating the patterns and obtaining the user data

To generate the rhythmic patterns for the educational scenarios, we used the automatic rhythmic exercise generator of the Troubadour platform. The Troubadour platform (Pesek et al. 2020a, b, 2022) is an open-source online platform for music theory and ear training that includes exercises for melodic, rhythmic, and harmonic dictation. Since the platform is capable of generating patterns at different difficulty levels, we implemented an API interface to the platform that generates rhythmic patterns. The generated patterns are stored in the game and can be used within the game even when the VR headset is not connected to the Internet.

The enhanced version of the game, once installed on a Quest device, includes five different user accounts to accommodate multiple users playing individually on a single device, usually within the same household. While playing, game data is stored on the device. If an Internet connection is available, data about the user’s performance is transmitted to the game server. The most important aspects stored for later analysis are the accuracy of the user’s drumbeats, and the direction and speed of the user’s hand movements, which we capture via the motion sensors in the Quest controllers. To evaluate user performance, we also captured the time at which the player should have hit the drum, the actual time of a hit, the ID of the drum played, and the current task and session. To capture controller motion data, we recorded the speed and rotation of the controllers at the time of the beat, primarily to observe the force of the player’s beating motions.

4 Experimental setup

Fig. 4
figure 4

User study plan schema consisting of pre-test phase (30–60 min), testing phase (14 days) and post-test phase (10–20 min)

In this study, we investigated the effect of playing a VR educational game on the development of rhythmic skills in children aged 7–15 years. The children’s musical backgrounds ranged from no musical training to 5 years of musical training. We wanted to find out if playing the game regularly over a period of time affects rhythmic skills. We also assessed the overall experience of regularly using a VR headset and our educational game. Three different instruments were used in the evaluation: the Tapping-PROMS test, the general questionnaire on the use of games and VR environments, and the User Experience Questionnaire (UEQ) to evaluate the game itself. The Tapping-PROMS test was used to assess participants’ rhythmic abilities before (pre-test) and after (post-test) playing the game. The test included only the temporal aspects (rhythm, rhythm-to-melody, accent) relevant to our study. The pre-study questionnaire included questions about the participants’ age, music experience, and gaming/VR experience. The post-study questionnaire included questions about their game experience (favourite and most difficult scenarios), overall experience with the virtual reality headset, and the UEQ questionnaire to assess overall user experience.

The experiment was carried out individually. It consisted of the following three phases (see Fig. 4):

  • first face-to-face meeting: Pre-study questionnaire and Tapping-PROMS evaluation. The children were then given a thorough introduction to the VR game in individual one-hour sessions. These sessions included a tutorial led by a member of the research team, with both a child and a parent present. A member of the research team was also available to answer additional questions about using the device and navigating the VR environment.

  • 14-day use of the device in home environment, playing the educational game for at least 15 min per day,

  • second face-to-face meeting: Post-study questionnaire and Tapping-PROMS evaluation.

During the 14-day trial period, participants were given a Meta Quest 2 device to use at home. Their parents were given instructions on how to manage the device, troubleshoot basic issues (e.g., chromecasting), and manage the device (by giving them account credentials). Participants’ actions in the game (controller movements, accuracy of input for an individual exercise) were transmitted to the game server or stored locally on the device if an Internet connection was not available.

4.1 Measurement instruments and data collection

4.1.1 Rhythm performance

Rhythm Performance was measured using the Tapping-PROMS rhythm test battery. We chose it, because it can adequately handle both isochronous pulse tasks covered by other popular battery tests like The Battery for the Assessment of Auditory Sensorimotor and Timing Abilities (BAASTA) and the Harvard beat assessment test (H-BAT) along with reproduction of various complex rhythms. The reproduction accuracy of an isochronous beat sequence can be assesed by calculating the mean absolute asynchronies between the taps and the pacing stimuli. For the rhythmic accuracy reproduction, the analysis is more difficult, which is why a comparison and alignment software is provided. It decomposes the reproduced pattern into an analysis of synchrony and structural correctness. The former measures reaction time deviation of the inter-tap-interval (ITI) to the respective inter-onset-interval (IOI) and the latter the structural correctness as the full proportion of correctly tapped ITIs in relation to the IOIs. For the first one, the absolute timings are used, while for the second relative timings are used. Inter-tap-interval timings are transformed in a relative range from 0 for the first tap up to 1 for the last tap and compared with relative IOIs.

We used the standalone PsychoPy desktop application to administer the test battery used in the pre-test and post-test sessions of our user study. Since we were investigating a younger demographic test group, on which none of the PROMS battery tests have been extensively tested, we choose to shorten the provided Tapping-PROMS test, primarily to keep the participants engaged across the pre-test and post-test sessions. We chose three patterns each for both the rhythmic and tempo phase of the administered tests of varying degrees of difficulty with an additional tutorial pattern at the beginning of each test phase.

4.1.2 User experience

User Experience was measured using the short-version of the User Experience Questionnaire (Schrepp et al. 2017). The original version of the UEQ was designed with six scales in mind, attractiveness, perspicuity, efficiency, dependability, simulation, and novelty containing 26 items that the participants rank on a 7-point Likert scale. Each item consists of a pair of terms with opposite meanings with alternating items starting with the positive and negative term. All of the six scales are condensed into two meta-dimensions pragmatic and hedonic quality. The short UEQ focuses only on these two dimensions providing four items for each. While we lose detailed feedback concerning six different aspects of user experience, it made sense to shorten the questionnaire as much as possible, given that we designed our study for children.

4.1.3 Play accuracy

Play accuracy is the in-game performance of the participants that we measured during the 14-day playtime period. We tracked the performance as an absolute distance of the played timing from the correct timing. A beat in the game was registered as correct if the absolute difference remained below 0.125 s, which we empirically chose based on the timings of all generated patterns, their tempo (60 beats per minute). Along with beat timings we tracked the drum that was hit on every beat and the velocity and rotation of the controllers on the beat.

5 Results

We collected the results of the rhythm tests, gameplay and questionnaires between March and September 2023.

5.1 Participants

Fig. 5
figure 5

Distribution of participants based on age

Fig. 6
figure 6

Distribution of participants based on video game playtime per week

Fig. 7
figure 7

Distribution of participants categorized by the number of years they have attended music school

14 participants between the ages of 7 and 14 took part in the study. The age distribution of the participants is shown in Fig. 5. All participants attend elementary school. Figure 6 shows that most participants (13) play video games, of which 11 participants play up to 3 h per week, and only one participant does not play video games. Most of them play video games either on a PC (8) or a smartphone (7). A large proportion of participants (9) have used a VR headset before, but not a MR headset (11), but only two participants use them regularly. More than half of the participants (8) were not enrolled in music school (Fig. 7), 3 participants have attended music school for a period of 1 to 3 years, while 3 participants have attended music school for more than 5 years. In addition, 8 participants have acquired some skills in playing an instrument, while 5 participants have never learned to play an instrument.

5.2 Tapping-PROMS rhythm test

The Tapping-PROMS rhythm test assesses rhythm performance based on the relative or absolute asynchrony of the reproduced rhythm compared to the given stimuli. We observe whether statistically significant changes in participant performance occur after 14-days of playing the VR game. We used the paired Student’s t-test, which is designed to compare the same variable across multiple tests in one subject, with the first test serving as the control group. When interpreting the results of the test, a lower score number denotes a better participant’s performance. In our analysis, we assume that the average mean of both the absolute and relative asynchrony of the Tapping-PROMS test performed remains the same in the worst case or is significantly lower in the best case. For this reason, we rely on the one-tailed variant of the paired test and observe only whether the decrease in average test results and consequently increased performance can be considered statistically significant. To maintain comparability with the original study by Georgi et al. (2023), we use the threshold \(p<0.05\) to reject the null hypothesis of the tests performed.

In Table 1 we first present the average results and the variance of the absolute asynchrony of the played rhythmic patterns. We can see that both the easy and the complex rhythmic patterns show a statistically significant change between the pre-test and post-test sessions, which is not the case for the moderate rhythmic patterns. Although we cannot unanimously reject the null hypothesis, the results do show a significant change in rhythmic performance after playing Steady the Drums! for 14-days.

Table 1 Absolute asynchrony in performed rhythmic patterns

We also present the average results of relative asynchrony in the performed Tapping-PROMS battery test in Table 2, which focuses on the structural correctness of the performed pattern rather than on the accuracy of each performed tap. In this case, only the simplest rhythmic patterns showed a statistically significant change in the pre- and post-test sessions, while the moderate and complex rhythmic patterns remained similar in their means and standard deviations.

Table 2 Relative asynchrony in performed rhythmic patterns

Lastly we present the results of the tempo patterns in the performed Tapping-PROMS battery tests in Table 3. As before, participants were measured on three tempo patterns of varying difficulty. The results show no particular change in the average asynchrony in all three of the performed patterns.

Table 3 Tempo patterns

5.3 Questionnaire results

Fig. 8
figure 8

Aggregated results from the user experience questionnaire

For the post-study questionnaire we used both the User Experience Questionnaire as well as additional questions related to their experience with the HMD, the educational scenarios and overall VR experience. In Fig. 8 we first present the aggregated UEQ results, which show the pragmatic, hedonic, and overall quality compared to established benchmarks. Pragmatic quality encompasses efficiency, dependability, and perspicuity of the system, while hedonic quality consist of provided stimulation and novelty of the system. Our aggregated results show that our VR experience received an above average result in the hedonic quality and a slightly below average score in pragmatic quality, but an overall above average score. We considered these results as positive given that most participants had very little or no experience with VR environments, which definitely attributed to the overall lower score in the pragmatic quality of the system.

We also observed to what degree the VR headset and controllers caused difficulties and discomfort to the participants. Results in Table 4 show that most participants found the devices relatively easy to use. We also inquired about the average playtime session of the participants, revealing that most played for approximately 10 to 15 min per session.

Table 4 VR experience questions

The last part of the questionnaire was focused on the participants’ experience with the educational scenarios. The results in Table 5 show that most participants found the last educational scenario to be both the least fun and the most difficult. The remaining three scenarios received similar ratings for fun, with the rhythm following scenario scoring the highest. Participants found the basic tap and rhythm following scenarios to be the easiest, with the Changing tap scenario also proving to be more challenging for several participants. Lastly participants were asked, if they could see themselves continuing to play our VR game after the end of the user study. Twelve of the participants answered positively, with only two participants answering negatively.

Table 5 Experience with educational scenarios

5.4 Analysis of gameplay data

We systematically gathered a large amount of gameplay data during the testing period, when each participants played the VR game for 14 days. Figure 9 shows the playtime over the 14-day period, with some deviations, including playing less than 14 days and more than 14 days with occasional days of no playing.

Fig. 9
figure 9

Figure represents cumulative playtime over the 14-day period with some deviation in the time-span for individual participants (shortest evaluation was 11 days, while longest evaluation was 22 days)

Figure 10 shows the difference in drum hit accuracy between children who attended music school and children who did not. However, the data also show that older children have more experience in music school training and that their performance is partly due to their age.

Fig. 10
figure 10

Comparison of average correct hit rate, split between music-school and no-music-school subgroups

Participants were encouraged to play the VR game daily, but without restrictions placed on their desired session times. The daily playtimes along are displayed in Fig. 11. The majority of participants played between 7 and 15 min, with an average daily playtime of 10 min and 7.9 s.

Fig. 11
figure 11

Histogram of play session durations in minutes per day

6 Discussion

In the experiment, we focused on younger audience and addressed the importance of positive user experience and a gamified environment for the efficacy of a virtual environment for music learning. The division of the participants into musicians and non-musicians subgroups should be observed with caution due to the small sample size of only 14 participants. Although we observed a small sample, the differences between these two subgroups indicate difference which may be significant. The limited size of the sample group is primarily attributed to resource constraints, notably the availability of VR equipment, as well as the willingness of both child participants and their parents to participate in the study, given that the process itself can be time-consuming due to the extensive preparation, the need for consistency, and the completion of questionnaires.

The results of the Tapping-PROMS tests showed statistically significant improvements in the simple and complex rhythmic patterns in terms of rhythmic ability, with more marked improvements in rhythmic reproduction performance than in the overall structural correctness of the executed patterns. In contrast, the moderate-complexity rhythms showed improvements only in the absolute asynchrony results, but these did not reach statistical significance. An intriguing aspect lies in the differences observed in complex rhythms. Absolute asynchrony results showed significant improvement, whereas performance on relative asynchrony tests deteriorated. To interpret these results, we performed an analysis of the rhythm samples used in the Tapping-PROMS tests. As illustrated in Fig. 12, the assigned level of complexity of these examples is somewhat debatable. This is because the chosen complex rhythms contain repeated subpatterns, whereas the moderate complexity rhythms do not. As outlined by Law and Zentner (2012), moderately difficult test stimuli were changed on the upbeat note and complex trials were rhythmic patterns consisting of sixteenth notes, with test trials having rhythm alterations on sixteenth notes. However, as shown in the example, the first measure of complex rhythms (R_C2 and R_CS1) consists of two rhythmically identical parts, while the measures of moderate rhythms (R_M1 and R_MS2) do not. The presence of repeated shorter patterns in the complex rhythms might contribute to better memorability and, subsequently, more precise reproduction of the pattern, resulting in better test performance. Therefore, it is important to acknowledge that the assessment of complexity is rather subjective. While the authors of the test have provided some insight into the evaluation of pattern complexity, it seems the overall structure of the pattern could significantly influence the complexity and should therefore be considered, in addition to the originally-considered alterations of the beat division or accent distribution when evaluating complexity.

Fig. 12
figure 12

A comparison of rhythm samples

The tempo portion of the Tapping-PROMS tests showed no significant changes in the pre-test and post-test studies. Interestingly, the test items of the Tapping-PROMS correspond to the learning scenarios for tempo adjustment, which were rated as the easiest of the four scenarios.

Not surprisingly, participants engaged differently with our VR game due to the uncontrolled environment during 14 days of individual use. Nevertheless, the results showed that participants engaged with the VR game on an almost daily basis. Moreover, their engagement with the game showed an improvement in their rhythmic skills without the need for direct supervision by a teacher. This observation is significant because it not only highlights the potential of VR to provide a learning experience similar to that of a simulated teacher environment, as highlighted by Moth-Poulsen et al. (2019), but also underscores its ability to provide entertaining experiences that motivate participants to engage in self-directed learning.

The UEQ results show above average scores, but there is room for improvement, especially in the pragmatic quality aspect of the task, which was below average. Recommendations from the participants include incorporating more diverse levels and introducing additional gamified elements to improve the experience. Additional questions arose during the evaluation of the questionnaire results, particularly in relation to how to address those participants who did not fully enjoy the VR experience. In discussions with the participants, issues with the head strap and the fixed three step pupillary distance adjustment were raised. It is worth noting that the new VR set, the Meta Quest 3, will feature an improved lens type that should improve the overall user experience, while the uncomfortable head strap issues can be addressed by third-party accessories.

In a previous comprehensive study conducted by Johnson et al. (2020), the results indicated that transferring music training from a guided Mixed Reality (MR) environment to real-world music training is challenging, as participants may overly rely on visual cues. In our development, we followed their findings and placed more emphasis on auditory cues in the learning scenarios. Even with lower immersion due to fewer visual cues, participants were able to improve their rhythmic skills. This is a promising result that, combined with the positive user experience feedback and engagement, supports the claim that the VR platform be useful as a musical learning tool. Furthermore, implementation of social gamification aspects, similar to those already available on the Troubadour platform (leaderboards, badges, avatars, points etc.), should further improve user engagement and consequently lead to an increase in their skills and performance.

7 Conclusion and future work

The use of virtual reality interfaces for music theory and rhythmic ear training holds much potential as a novel e-learning environment. Our study with the adapted version of the VR game Steady the Drums! has shown that multimodal interaction through virtual reality environments can provide an efficient immersive learning experience for elementary school children through self-learning scenarios and without the need for formal prior musical knowledge. The goal of our user study was to observe the effectiveness of a gamified VR environment for learning rhythmic perception and performance skills. By using a variation of the established PROMS music test, we were able to measure the usefulness and effectiveness of the proposed VR game as a learning alternative for rhythmic skills.

The comprehensive 14-day study, augmented with pre- and post-assessments, provided solid evidence of the positive effects of playing a VR game on the participants’ rhythmic skills. The VR environment received positive user feedback measured through both the UEQ and additional game-specific questions. Most importantly, participants were motivated to play our learning scenarios at home without direct supervision, showing the potential of VR to support what are often more mundane music practice tasks. Our current work encompasses designing a long-term experiment, in which other music perceptual skills (pitch perception, short term memory) will be observed. Some of the occurred challenges of the described study in this paper remain to be tackled: gathering a sufficient number of devices for concurrent evaluation, motivating participants to participate in a longer experiment, addressing the participants’ age and reading-related comprehention challenges for younger (6–8 year old) participants, and expanding the scenarios to retain the participants’ motivation over a longer time period.

Looking to the future, this innovative VR approach to e-learning in music offers a variety of exciting possibilities. The adaptability of the game makes it versatile for other aspects of music theory learning beyond its current application, creating an engaging learning platform. By incorporating elements of social gamification such as leaderboards, badges, and additional levels, the game could be extended for longer engagement and serve as an additional learning resource over longer periods of time, which is particularly useful for distance learning. The experimental version of the Steady the Drums! game is already available for free in the Oculus Lab Store and is currently accessible through App Lab, as it has not yet gone through the full Meta Quest review process. We plan to release the learning version developed in this work as an update.

This form of immersive learning could change the way teachers view current conventional approaches to teaching music theory and provide students with a fun, engaging, and effective way to develop their musical skills. In addition, this approach does not require teacher supervision or control, making it very useful for home training. As VR technology continues to advance and becomes more accessible, it is clear that its application in music theory education will only become more widespread and used in various scenarios, such as self-learning and distance learning.